22

Consider the following example:

case Foo:
    ...
    break;
case Bar:
    ...
    break;
case More: case Complex:
    ...
    break:
...

Say, we would like to retrieve all matches of the regex case \([^:]*\): (the whole matching text or, even better, the part between \( and \)), which should give us (preferably in a new buffer) something like this:

Foo
Bar
More
Complex
...

Another example of a use case would be extraction of some fragments of an HTML file, for instance, image URLs.

Is there a simple way to collect all regex matches and take them out to a separate buffer in Vim?

Note: It’s similar to the question “How to extract text matching a regex using Vim?”. However, unlike the setting in that question, I’m also interested in removing the lines that don’t match, preferably without a hugely complicated regex.

ib.
  • 27,830
  • 11
  • 80
  • 100
Wernight
  • 36,122
  • 25
  • 118
  • 131
  • 2
    Do you mean backreferences? `:%s/^\vcase ([^:]+):/\1/` Use `\1` to get the first capturing group. – mathematical.coffee Jan 31 '12 at 12:38
  • If you just want to extract these to a new file (it's unclear from your question), you could do this more easily with sed or grep; sed example: `sed -n '/^\s*case\s\+/{s/\s*case\s\+\([^:]\+\):/\1/;p}' file` – beerbajay Jan 31 '12 at 12:57
  • @beerbajay: Yes in a new file it's fine. I agree sed would do it well, just I would have to start a command prompt and find the file again, so I'm looking for a Vim solution. – Wernight Jan 31 '12 at 13:06
  • @mathematical.coffee: Not at all. The issue is not search & replace (unless you include new lines) but grabbing all matches and putting them in another buffer. – Wernight Jan 31 '12 at 14:14
  • 1
    This is very similar to this question: http://stackoverflow.com/questions/4503748/remove-everything-except-regex-match-in-vim/4521486 – Peter Rincker Jan 31 '12 at 14:45
  • @PeterRincker: You're right. The question was formulated differently but it's pretty much the same goal. Seems there is no "simple" answer. :( – Wernight Jan 31 '12 at 15:53

5 Answers5

32

There is a general way of collecting pattern matches throughout a piece of text. The technique takes advantage of the substitute with an expression feature of the :substitute command (see :help sub-replace-\=). The key idea is to use a substitution enumerating all of the pattern matches to evaluate an expression storing them without replacement.

First, let us consider saving the matches. In order to keep a sequence of matching text fragments, it is convenient to use a list (see :help List). However, it is not possible to modify a list straightforwardly, using the :let command, since there is no way to run Ex commands in expressions (including \= substitute expressions). Yet, we can call one of the functions that modify a list in place, for example, the add() function that appends a given item to a list (see :help add()).

Another problem is how to avoid text modifications while running a substitution. One approach is to make the pattern always have a zero-width match by prepending \ze or by appending \zs atoms to it (see :help /\zs, :help /\ze). The pattern modified in this way captures an empty string preceding or succeeding an occurrence of the original pattern in text (such matches are called zero-width matches in Vim; see :help /zero-width). Then, if the replacement text is also empty, substitution effectively changes nothing: it just replaces a zero-width match with an empty string.

Since the add() function, like most of the list modifying functions, returns the reference to the changed list, for our technique to work we need to somehow get an empty string from it. The simplest way is to extract a sublist of zero length from it by specifying a range of indices such that a starting index is greater than an ending one.

Combining the aforementioned ideas, we obtain the following Ex command:

:let m=[] | %s/\<case\s\+\(\w\+\):\zs/\=add(m,submatch(1))[1:0]/g

After its execution, all matches of the first subgroup are accumulated in the list referenced by the variable m, and can be used as is or processed in some way. For instance, to paste the contents of the list one by one on separate lines in Insert mode, type

Ctrl+R=mEnter

To do the same in Normal mode, simply use the :put command:

:put=m

Starting with version 7.4 (see :helpg Patch 7.3.627), Vim evaluates a \= expression in the replacement string of a substitution command for every match of the pattern, even when the n flag is given (which instructs it to simply count the number of matches without substituting—see :help :s_n). What the expression evaluates to does not matter in that case, because the resulting value is being discarded anyway, as no substitution takes place during counting.

This allows us to take advantage of the side effects of an expression without worrying about leaving the contents of the buffer in tact in the process, so all the trickery with zero-width matching and empty-sublist indexing can be elided:

:let m=[] | %s/\<case\s\+\(\w\+\):/\=add(m,submatch(1))/gn

Conveniently, the buffer does not even get marked as modified after running this command.

ib.
  • 27,830
  • 11
  • 80
  • 100
  • Nice answer. I especially like the little trick with `extend()` in the replace expression. – Herbert Sitz Jan 31 '12 at 20:11
  • @HerbertSitz: Thanks, I just have noticed that it is possible to use the `add()` function instead of `extend()`. By the way, I have rewritten the answer to explain the technique in more detail. – ib. Feb 01 '12 at 06:43
  • 1
    Nice trick. Since the substitution has the side effect of setting 'modified', anyway, we can alternatively have `add()` return the last added element `[-1]`; this saves us from the zero-width match and capture: `:let t=[] | %s/\ – Ingo Karkat Sep 14 '12 at 07:39
  • @Ingo: But then we will end up with the list containing `case Foo:`, `case Bar:`, etc, and not `Foo`, `Bar`, etc, as required. It seems that we can't solve the problem correctly without changing boundaries of the match using `\zs` or `\ze` anyway. – ib. Sep 17 '12 at 22:56
3

Though it's not possible to write a one-liner to accomplish your example, it's hard to type commands such as :%s/case \([^:]*\):/\=.../ interactively.

I prefer using vim-grex with the following steps:

  1. Use / to check whether a regular expression matches to expected lines. For example: /^\s*\<case\s\+\([^:]*\):.*$<Enter>
  2. Execute :Grey. It yanks lines matched to the current search pattern.
  3. Open a new buffer by :new etc.
  4. Put the yanked lines by p etc.
  5. Trim uninteresting parts by :%s//\1/.
Kana Natsuno
  • 1,045
  • 8
  • 13
2

How to use vim regex to extract the word from the following line, given that 'help' might be any word like 'rust' or 'perlang'.

vim:tw=78:ts=8:ft=help:norl:

Solution:

let foo = substitute(foo, '^\s*vim:.*:ft=\([a-z]\+\).*:\s*$', '\1', '')
echo "foo: '" . foo . "'"

Prints:

foo: 'help'

Guru meditation: What's going on here?

Take the string in the variable foo and match it to assert the beginning of the line, then any number of spaces, the literal vim and a literal colon, then any number of any characters followed by colon ft= with any word with letters, then anything, and assert the line ends with a colon. Throw all that into a register named 1, then get that back in parameter 2 which substitute takes on and replaces the prior string with.

As a general philosophy, any regex longer than your finger on the screen is an epic fail, so decrease screen resolution until it fits.

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
1

As small addition to ib.'s accepted answer, which works well as is. It seems like the flag n is enough avoid the issues with unwanted substitution.

:let t=[] | %s/\<case\s\+\(\w\+\):/\=add(t,submatch(1))/gn

From the s_flag help:

[n] Report the number of matches, do not actually substitute. The [c] flag is ignored. The matches are reported as if 'report' is zero. Useful to count-items. If \= sub-replace-expression is used, the expression will be evaluated in the sandbox at every match.

Stalpotaten
  • 459
  • 2
  • 11
  • I just came across this behavior of the `n` flag while scanning `:help :s_flags` for something else! After going back and updating my answer to take advantage of this feature, I noticed that you have already discovered it since then, too. Great job on catching that! – ib. Feb 16 '23 at 07:50
  • Turns out, it was introduced during the development of Vim 7.4 (see `:helpg Patch 7.3.627`), and when I was writing my original answer, it did not exist yet (it was committed to Vim repository eight months later in August 2012 and released with version 7.4 one more year after that in August 2013). I wish I had learned about it earlier. – ib. Feb 16 '23 at 07:50
0
:g/^case\s\L\l\+\scase.*/s/case/\r&/g
:let @a=''|g/^case\s\L\l\+:/y A

Now open a new buffer or tmp file, and aply:

"ap
:%s_^\vcase ([^:]+):_\1_

Or if you don't care for your current buffer (you can undo this of course) (updated for the complex example):

:g/^case\s\L\l\+\scase.*/s/case/\r&/g
:v/^case\s\L\l\+:/d
:%s_^\vcase ([^:]+):_\1_
ib.
  • 27,830
  • 11
  • 80
  • 100
Zsolt Botykai
  • 50,406
  • 14
  • 85
  • 110
  • 3
    There are definitely some errors in the commands listed in the first code snipped. Have you run them before posting? Neither of those two commands won't even run! What you probably meant is something like `:let@a=''|g/^case\s\L\l\+:/y A`. – ib. Jan 31 '12 at 13:34
  • `:v/.../d` or `:g!/.../d` is a nice trick, so it deletes all non matching lines. However it's not really exacting the regex matched expression. It's extracting the matching lines and then supposing there is single match per line the second search & replace would work. It wouldn't work in the general case. I'll update my sample. – Wernight Jan 31 '12 at 14:12
  • @ib. thanks for pointing it out, you are right. This happens when I'm on windows, in front of excel... updating hte answer. – Zsolt Botykai Jan 31 '12 at 14:51
  • @Wernight, OK, I had updated my answer for your special case. – Zsolt Botykai Jan 31 '12 at 15:06