Emacs - Subword Regular Expressions Clarification

Question

I'm trying to change the locations at which the subword-mode commands (subword-forward,subword-backward, etc.) stop.

I noticed that subword.el provides regular expressions for forward and backward matching, and I've been messing with them in trying to make some headway on adding more subword delimiters.

What I would really like is some clarification on how exactly the subword regular expressions work, as far as what exactly is being matched, so that I might be able to change it to include characters I want to stop on. I have a basic understanding of regular expressions and have used them before, but never any as large as those in subword.el.

I don't necessarily need help for both regular expressions as well. Any guidance on adding additional delimiters to one of the existing regular expressions would be equally appreciated, since that is my goal in changing them, but I would really like to know a bit about how the regular expressions are set up.

Lastly, in searching for a solution, I found this related StackOverflow question. I read it over, but subword.el doesn't contain the regular expressions itself as it looks to appear in the quoted section of the related question, and I don't understand what is meant by the last parenthetical statement in that quoted section.

Edit:

To try to put what I am looking to do in a clearer context, I just want the Ctrl+Left/Right in Emacs (subword-forward/backward) to act as closely to Eclipse as possible, in that I would like to have the cursor move similarly, stopping at the end and beginnings of lines with Ctrl+Left/Right once reached.

Here is another related StackOverflow question. The "viper" commands are much closer to what I am looking for, but slightly off, because I want the point to stop at the end of the line before continuing to the next.

Note that for more intricate subword-mode customisation, you can define alternative functions to assign to the `subword-forward-function` and `subword-backward-function` variables. If you're thinking about changing the syntax of arbitrary characters just to get different behaviour out of subword mode, you might find the function-based approach less of a hack. — phils, Mar 02 '14 at 02:10
Here is a link to a thread that contains an example that you can modify to suit the behavior that you prefer. Due to the fact that some major modes change the syntax table, you would need to write exceptions for whatever modes you normally use: http://stackoverflow.com/questions/18675201/alternative-to-forward-word-backward-word-to-include-symbols-e-g — lawlist, Mar 02 '14 at 16:15

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

The answer to the question in your last paragraph is contained in the other answer on that same linked page: (modify-syntax-entry ?\\ "w"). That makes backslash be a word-constituent character, so word functions treat it as part of a word.
Please specify the behavior you are trying to implement, in particular, what you mean by "adding more subword delimiters."
The regexps in subword.el are fairly straightforward. You say you do not need help understanding those regexps. But then what do you mean by asking "how exactly the subword regular expressions are constructed"? They were likely constructed by hand (based on what you already understand their various parts to be for).
A guess, since your description is unclear to me so far, is that all you are looking for is to specify some additional chars as having non-word syntax. If that is what you mean by "adding more subword delimiters" then just do that. If, for example, you want the char a to be a non-word character, then do something like this:

(modify-syntax-entry ?a ".") ; Or another nonword-constituent syntax class (this uses punctuation)

That makes a be a punctuation character instead of a word-constituent character. If you want some other syntax class than punctuation, then choose it similarly.

Update after comments

E.g., If you want any punctuation syntax to act the same as an uppercase letter, this will do it:

(defvar subword-forward-regexp
  "\\W*\\(\\(\\([[:upper:]]\\|\\s.\\)*\\(\\W\\)?\\)[[:lower:][:digit:]]*\\)"
  "Regexp used by `subword-forward-internal'.")

(defvar subword-backward-regexp
  "\\(\\(\\W\\|[[:lower:][:digit:]]\\)\\(\\([[:upper:]]\\|\\s.\\)+\\W*\\)\\|\\W\\w+\\)"
  "Regexp used by `subword-backward-internal'.")

Or if you want, say, just , to act the same as an uppercase letter, this will do that:

(defvar subword-forward-regexp
  "\\W*\\(\\([,[:upper:]]*\\(\\W\\)?\\)[[:lower:][:digit:]]*\\)"
  "Regexp used by `subword-forward-internal'.")

(defvar subword-backward-regexp
  "\\(\\(\\W\\|[[:lower:][:digit:]]\\)\\([,[:upper:]]+\\W*\\)\\|\\W\\w+\\)"
  "Regexp used by `subword-backward-internal'.")

If this is still not what you want, then try explaining what you want a bit better. E.g., you have not given a single example -- neither positive (should stop here) nor negative (should not stop here). You make those who try to help you guess more than they should have to, which is not efficient.

Sorry, let me try to clarify myself a little. Overall, by "delimiting", I meant I would like to change the regexps so that they will stop on additional characters, such as symbols, parens, etc. As far as the word/non-word differences are concerned, should I change the characters I want to stop on into word characters? — , Mar 01 '14 at 23:15
What do you mean by "stop on"? If you want a char not to be considered part of a word or subword then give it a non-word syntax. If you want it to instead be treated as a char that starts a subword, as is the case for uppercase, for example, then follow how `[:upper:]` is handled in the current subword code -- e.g., have the regexp do EITHER `[:upper:]` OR your char, etc. — Drew, Mar 01 '14 at 23:29
I've added additional info to the original post just entailing the result I want. For the "stopping on", I intend to have subword functions act so that the point will "stop", or be moved to I suppose, characters such as parens, quotes, and the beginning and endings of lines, in addition to the capital letters within words. — , Mar 01 '14 at 23:37

Emacs - Subword Regular Expressions Clarification

1 Answers1