09 June 2010

Regular expression to split a string

Parsing flat files can be quite annoying, as evidenced in the NEXUS format in Bioinformatics. Regular expressions help and here is a fun one that can be used to split a string on the ';' character, but not when it is enclosed in [] (which is a comment in NEXUS).

(.*?)(?<!\[[^\]]*?);

This can be the basis of any pattern that needs to skip over comments, although I have not tested it with nested comments. Thankfully, that was beyond the scope of what I needed.

No comments: