1

I am trying to use font-lock and elisp's regex to highlight something like this:

class Foo implements A, B, C, D { }

The problem is unknown length of comma separated list after implements. I've already done regex which highlights all words on list (using re-buider, A, B, C and D are highlighted):

"implements\\s-+\\(?:\\(\\sw+\\)\\s-*,\\s-*\\)*\\(\\sw+\\)"

but I'm unable to combine this with font-lock.

Obviously

'("implements\\s-+\\(?:\\(\\sw+\\)\\s-*,\\s-*\\)*\\(\\sw+\\)"
  (1 font-lock-type-face) (2 font-lock-type-face))

doesn't work, because it highlights only to last occurances (C and D) ignoring star (*) after first backreference.

Is there a way to capture a list of all matched words or maybe enteirly different way to solve this problem?

Grzegorz Rożniecki
  • 27,415
  • 11
  • 90
  • 112

2 Answers2

2

If you don't want the commas to be highlighted too, your approach cannot work. When you use a subexp-highlighter of the form

(subexp facespec)

the subexp refers to the sub-group of your regex and highlights with the given facespec. Now, a sub-group of a regexp match is a continuous span of text with a beginning and end. In fact, whenever you do a regexp search, you can query those values with the functions (match-beginning subexp) and (match-end subexp).

But that means that you cannot match a variable number of classnames excludings commas with a single sub-expression, because that sub-expression would have to be a continuous span. And a continuous span that covers a variable number of classnames must always contain the commas, too, there's no way around that.

Here's another reason why your approach is not such a good ideas: your regexp explicitly uses whitespace. It doesn't matter if the whitespace is excluded from highlighting, but even using it in the regexp is not such a great idea, because wherever whitespace is allowed, you could always encounter comments as well.

Consider the following code line:

class Foo implements A, /*B, C,*/ D { }

In that case, you would want the characters in the span /*B, C,*/ to be highlighted using the font-lock-comment-face, and the surrounding classes in font-lock-type-face. You can still achieve this effect if you highlight comments only after everything else has already been hightlighted, and allow comments to override other font-lock matches. But this will lead to rather inefficient matching, because every comment would then first be highlighted as if it were code, and then be highlighted as a comment in a second pass.

A solution to both problems would probably be to divide the matching of the keywords ("implements") and the classes into two different matching rules, perhaps you could use as a starting point something along the lines of:

'(("\\bimplements\\b" . font-lock-keyword-face)
  ("\\b[A-Z]\\w*\\b" . font-lock-type-face))
Thomas
  • 17,016
  • 4
  • 46
  • 70
  • I didn't solve the initial problem, tried to check cc-mode / java-mode sources, but that solution breaks when line break is added... So I guess it's more like you explained above. Thanks! – Grzegorz Rożniecki Dec 13 '11 at 19:16
1

Something like this seems to work here:

'("\\(implements\\)\\s-+\\(\\(\\sw+\\s-*,\\s-*\\)*\\sw+\\)"
   (1 font-lock-warning-face)
   (2 font-lock-keyword-face))

(and obviously you probably want different faces...)

Rupert Swarbrick
  • 2,793
  • 16
  • 26