I am trying to create a regex pattern that reads through a bibTex citation file and match everything inside the brackets. For those who don't know, a bibtex citation looks like the following :
@INPROCEEDINGS{Fogel95,
AUTHOR = {L. J. Fogel and P. J. Angeline and D. B. Fogel},
TITLE = {An evolutionary programming approach to self-adaptation
on finite state machines},
BOOKTITLE = {Proceedings of the Fourth International Conference on
Evolutionary Programming},
YEAR = {1995},
pages = {355--365}
}
@ARTICLE{Goldberg91,
AUTHOR = {D. Goldberg},
TITLE = {Real-coded genetic algorithms, virtual alphabets, and blocking},
JOURNAL = {Complex Systems},
YEAR = {1991},
pages = {139--167}
}
@INPROCEEDINGS{Yao96,
AUTHOR = {X. Yao and Y. Liu},
TITLE = {Fast evolutionary programming},
BOOKTITLE = {Proceedings of the 6$^{th}$ Annual Conference on Evolutionary
Programming},
YEAR = {1996},
pages = {451--460}
}
The current pattern I have is as follows:
@(\\w+)\{(\\w+),\\s*((\\w+)\\s*=\\s*(\\"|\\{)?(.+)(\\"|\\})?,?\\s*)+\\}
This pattern matches the second citation but only parts of the first and third. I know the reason it doesn't match the third citation is because of the brackets within the left hand side of the citation ( 6$^ { th } $ ) and I have figured out that it won't match citations that have whitespaces/newlines within the left hand side of the citation elements
BOOKTITLE = {Proceedings of the Fourth International Conference on
Evolutionary Programming},
//This part of the citation has a newline in the middle of it.
Now I have been slaving away trying to fix my pattern, but the thing with regular expressions that I have found, is that the longer I try to fix the expression/add new conditions to it, the more confusing it gets. I am just wondering how I capture the whole citation regardless of inner brackets/parenthesis. Some citations contain no brackets/parenthesis after the "=" sign at all. Any help, along with an explanation would be greatly appreciated. I have looked at similar examples which have only confused me more due to the difficulty of deciphering a regular expression by simply glancing at it. Thank you.