I am on the lookout for a regular expression in R to extract the fields given in an .sdf chemical data file. The fields in this case are delimited by < > and follow a "> " at the start of a line. E.g. in the case of
string="> <FIELD1>\nfield text1\n\n> <FIELD2>\nfield text2\n\n> <FIELD3>field text3"
it would have to return
fields=c("FIELD1","FIELD2","FIELD3")
(they could occur multiple times, so I would need only the unique()
ones)
Any thoughts?
cheers, Tom