I am just starting learning regex, and I was wondering if it could be used to parse and style HTML/XHTML code. After reading this hilarious answer I realized that it can't be done. My question is, how do programs and text editors like Dreamweaver, Notepad++, and Sublime text color coordinate their code? It's obviously possible, by some other means, I'm just curious how it is done. My hunch is a long list of key words mixed with some regex. What do you guys think?
Asked
Active
Viewed 73 times
0
-
5You CAN parse HTML with RegEx* http://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.html I wouldn't reccomment it though – Cas Nouwens Nov 27 '13 at 13:46
-
Different syntax highlighters work in different ways. Plenty of them are open source though, so you could pick a few and compare. – Quentin Nov 27 '13 at 13:49
-
That's a very nice article @CasNouwens! – CompuChip Nov 27 '13 at 13:57
-
it is starred in the RegEx chatroom where i am a regular customer XD. thanks anyway @CompuChip – Cas Nouwens Nov 27 '13 at 14:34
1 Answers
2
The essential concept is fairly straightforward:
They parse the code, according to its grammar, and mark different parts of the text as belonging to the appropriate grammar class.
Then, when text is drawn, its color is determined by the attached grammar class.
In practice, syntax highlighters will often use a simplified grammar; leading to a solution that works for most code, but can get confused by uncommon structures. This trade-off is typically worth it.

Williham Totland
- 28,471
- 6
- 52
- 68