1

I'm trying to learn how to write emacs major-modes. There are lots of great tutorials online (e.g. http://www.emacswiki.org/emacs/GenericMode), but I'm struggling to learn the syntax for regex matching. For example, from this answer I'm trying to understand why

'(("\"\\(\\(?:.\\|\n\\)*?[^\\]\\)\""

from

(define-derived-mode rich-text-mode text-mode "Rich Text"
  "text mode with string highlighting."

  ;;register keywords
  (setq rich-text-font-lock-keywords
        '(("\"\\(\\(?:.\\|\n\\)*?[^\\]\\)\"" 0 font-lock-string-face)))
  (setq font-lock-defaults rich-text-font-lock-keywords)
  (font-lock-mode 1))

matches anything between double quotation marks. This material: http://www.gnu.org/software/emacs/manual/html_node/elisp/Regexp-Special.html#Regexp-Special doesn't seem to explain that.

Are there any better resources out there?

Community
  • 1
  • 1
DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119
  • When presenting a node "Regexp-Special", which obviously covers some special cases, you should be able to look upward at "Syntax of Regexps" rather than down-vote people trying to help. – Andreas Röhler Sep 03 '13 at 08:13

1 Answers1

2

An answer to your question of what the regexp does --- The regexp in the example you cite is actually "\"\\(\\(?:.\\|\n\\)*?[^\\]\\)\"".

The parts to match are:

  • \", which matches only a " char --- this is at the beginning and the end of the regexp.

  • A group, which contains \\(?:.\\|\n\\)*? followed by [^\\]. The group is presumably there so that font-lock-keywords can be told to do something with that part of a match, i.e., the part between the matching " at the beginning and end.

  • \\(?:.\\|\n\\)*?, the first part of the group, matches zero or more characters --- any characters. The *? could be just * (same thing). The . matches any char except a newline char, and the \n matches a newline char. The \\| means either of those is OK.

  • [^\\] matches any character except a backslash (\).

So putting it together, the group matches zero or more chars followed by a char that is not a backslash. Why not just use a regexp that matches zero or more chars between " chars? Presumably because the person wanted to make sure the ending " was not escaped (by a backslash). However, note that the regexp requires there to be at least one char between the " chars, so that regexp does not match the empty string, "".

A good resource is: http://www.emacswiki.org/emacs/RegularExpression.

Drew
  • 29,895
  • 7
  • 74
  • 104
  • Thanks @Drew, that's extremely helpful! The resource link you included however, left me confused. It doesn't explain why `\\ ` work, or the `: `. – DilithiumMatrix Aug 31 '13 at 17:12
  • 1
    The best reference doc about regexps, which does explain about `\\ ` and `:`, is the Elisp manual. Start with node `Regular Expressions`. Both are explained in node `Regexp Backslash`. See also node `Syntax for Strings` for the use of backslashes in Lisp strings. Remember that `i` in Info is your friend for finding things, and `g` takes you directly to a given node – Drew Aug 31 '13 at 17:23