0

I would like to extract the usernames from a long text file built from Twitter posts. I have tried with expressions such as

:%s#\([^@].\{-}\) ##g
:%s#\(\<[^@].\{-}\>\) ##g

but it doesn't work. I read Vim's documentation for @, but, as far as I know, it applies to an escaped @, not a plain @.

How would I build an expression which erases the words which do not begin with "@"?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303

3 Answers3

3

You can use this regex in vim:

@\@<!\<\w\+\>

This will match all words that are not preceded by @ character.

To match all non-space characters not preceded by @ character use:

@\@<!\<\S\+\>

\@<! is the syntax for using negative lookbehind in vim which is equivalent of (?<!@) otherwise.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    You can leave off the unnecessary grouping `\(...\)` here, as `@` is a single atom. Makes it slightly easier to read and type. – Ingo Karkat Dec 08 '14 at 12:56
  • Better escape that `+` to make it `\+` unless you plan on using `\v`. e.g. `:%s/\v\@@<!<\w+>//g`. See `:h /\v` for more on magic – Peter Rincker Dec 08 '14 at 20:39
  • Thanks for your solution. I would appreciate very much if you point me to any good document about this. You now, my regexp level is more or less what is contained in http://vimregex.com/. For sure I am not a vim ninja :/ – Juan Luis Chulilla Dec 08 '14 at 22:20
  • You can check [this post](http://ssiaf.blogspot.com/2009/07/negative-lookbehind-in-vim.html) and [this post](http://www.inputoutput.io/lookbehind-lookahead-regex-in-vim/). Also if it worked out then you may mark the answer as accepted by clicking on tick mark on top-left of my answer. – anubhava Dec 08 '14 at 22:29
0

Don't know why you want to do this in vim. I assume you have a unix/linux OS as you mention vim. Thanks to extract words from a file I found the following solution:

grep -o -E '@\w+' twitterlog.txt > usernames.txt
Community
  • 1
  • 1
Conffusion
  • 4,335
  • 2
  • 16
  • 28
  • 1
    Just because someone mentions vim does *NOT* tie them to *nix. It runs on [a large number of non-*nix platforms](http://en.wikipedia.org/wiki/Vim_(text_editor)#Availability). – the Tin Man Dec 08 '14 at 23:02
0

Your question asks "How can I remove everything not matching some pattern?".

I want to answer "How do I capture all the matches (and delete the contents of buffer and paste the matches)?"

Why not "remove everything not matching some pattern"?

Regex's are good at matching patterns, however not matching is trickier. Sure sometimes you can use negative look-aheads and look-behinds, but not every case is so straight forward. Matching exactly what you want is far easier. However if you do want to do it here is as close as I can get without breaking my brain:

:%s/.\&\(@\w*\)\@<![^@]//g

Note: this leaves trailing spaces and blank lines

Overview

The idea is to capture each match via :s and in the replacement execute an expression that will build up the matches into a register. Then delete, :d, all the lines and paste the register with the matches back to the register.

The How

:let @a = ""
:%s/@\w\+/\=setreg('A', submatch(0), 'l')/n
:%d_
:%pu a
:1d_

Glory of details

  • Clear the a register via let @a = ""
  • Match the twitter users via @\w\+ pattern
  • Use \= inside the replacement of the :s to execute an expression
  • use setreg() to set the value of the register
  • using a capital register will append instead of replace
  • submatch(0) yields the matched content
  • using the 3rd parameter value of 'l' specifies to append matches line-wise
  • using the n flag will prevent the buffer from being altered (optional)
  • :%d_ delete entire buffer to the black hole register
  • :pu a will put the a register
  • :1d_ will remove the empty first line

Well that is great but it is so much to type...

It may be a bunch to type compared to :%!grep -E -o '@\w+', but it is a pure vim solution. We can shorten into a single line if that would be better

:let @a = "" | %s/@\w\+/\=setreg('A', submatch(0), 'l')/n | %d_ | %pu a | 1d_

Probably not if you have to do anything like this on a regular basis. Here is a quick n' dirty command to put in your ~/.vimrc file:

" Extractomatic
" Replace the current buffer with each match on seperate line
" Usage:
"     :Extractomatic/pattern/
command! -nargs=+ Extractomatic
      \ let s:var = @a |
      \ let  @a = "" |
      \ %s<args>\=setreg('A', submatch(0), 'l')/n |
      \ %d_ |
      \ %pu a |
      \ 1d_ |
      \ let @a = s:var

Now you can just do :Extractomatic/@\w\+/.

However there are more robust solutions to this like Ingo Karkat's Extract Matches plugin and the Yankitute plugin.

Conclusion

Personally whichever way you want to use to solve this problem is good. However knowing how to use :s with a sub-replace-expression is a great way to level up your vim-script-fu

More help

:h :s
:h sub-replace-expression
:h submatch(
:h setreg(
:h registers
:h :d
:h :pu
:h range
Peter Rincker
  • 43,539
  • 9
  • 74
  • 101