Unknown number of matches in regex and font-lock

Question

I am trying to use font-lock and elisp's regex to highlight something like this:

class Foo implements A, B, C, D { }

The problem is unknown length of comma separated list after implements. I've already done regex which highlights all words on list (using re-buider, A, B, C and D are highlighted):

"implements\\s-+\\(?:\\(\\sw+\\)\\s-*,\\s-*\\)*\\(\\sw+\\)"

but I'm unable to combine this with font-lock.

Obviously

'("implements\\s-+\\(?:\\(\\sw+\\)\\s-*,\\s-*\\)*\\(\\sw+\\)"
  (1 font-lock-type-face) (2 font-lock-type-face))

doesn't work, because it highlights only to last occurances (C and D) ignoring star (*) after first backreference.

Is there a way to capture a list of all matched words or maybe enteirly different way to solve this problem?

score 2 · Accepted Answer · answered May 29 '11 at 22:06

If you don't want the commas to be highlighted too, your approach cannot work. When you use a subexp-highlighter of the form

(subexp facespec)

the subexp refers to the sub-group of your regex and highlights with the given facespec. Now, a sub-group of a regexp match is a continuous span of text with a beginning and end. In fact, whenever you do a regexp search, you can query those values with the functions (match-beginning subexp) and (match-end subexp).

But that means that you cannot match a variable number of classnames excludings commas with a single sub-expression, because that sub-expression would have to be a continuous span. And a continuous span that covers a variable number of classnames must always contain the commas, too, there's no way around that.

Here's another reason why your approach is not such a good ideas: your regexp explicitly uses whitespace. It doesn't matter if the whitespace is excluded from highlighting, but even using it in the regexp is not such a great idea, because wherever whitespace is allowed, you could always encounter comments as well.

Consider the following code line:

class Foo implements A, /*B, C,*/ D { }

In that case, you would want the characters in the span /*B, C,*/ to be highlighted using the font-lock-comment-face, and the surrounding classes in font-lock-type-face. You can still achieve this effect if you highlight comments only after everything else has already been hightlighted, and allow comments to override other font-lock matches. But this will lead to rather inefficient matching, because every comment would then first be highlighted as if it were code, and then be highlighted as a comment in a second pass.

A solution to both problems would probably be to divide the matching of the keywords ("implements") and the classes into two different matching rules, perhaps you could use as a starting point something along the lines of:

'(("\\bimplements\\b" . font-lock-keyword-face)
  ("\\b[A-Z]\\w*\\b" . font-lock-type-face))

I didn't solve the initial problem, tried to check cc-mode / java-mode sources, but that solution breaks when line break is added... So I guess it's more like you explained above. Thanks! — Grzegorz Rożniecki, Dec 13 '11 at 19:16

score 1 · Answer 2 · answered May 26 '11 at 09:20

1

Something like this seems to work here:

'("\\(implements\\)\\s-+\\(\\(\\sw+\\s-*,\\s-*\\)*\\sw+\\)"
   (1 font-lock-warning-face)
   (2 font-lock-keyword-face))

(and obviously you probably want different faces...)

answered May 26 '11 at 09:20

Rupert Swarbrick

2,793
16
26

It's not exactly what I wanted, because it highlights commas (and whitespaces between words, because they are matched). – Grzegorz Rożniecki May 26 '11 at 10:17
P.S. It's `font-lock-keyword-face` for first match (implements) and `font-lock-type-face` for rest ;) – Grzegorz Rożniecki May 26 '11 at 10:20

Unknown number of matches in regex and font-lock

2 Answers2