2

I have a text file this way

"an arbitrary string" = "this is the text one"
"other arbitrary string" = "second text"
"a third arbitrary string" = "the text number three"

I want to obtain only this

an arbitrary string
other arbitrary string
a third arbitrary string

That is, the text inside the first quotes, or between the first " and the " =. I used this regex

(?!").*(?=(" =))

This is working when I tried it in RegExr and in this online tool. But in my OSX Terminal it does not work, the output is empty

grep -o '(?!").*(?=(" =))' input.txt

What is wrong here? Do I have to escape some characters? I try everyone and nothing changes.

Thank you so much and please excuse my lack of knowledge about this topic.

emenegro
  • 6,901
  • 10
  • 45
  • 68
  • Have you tried `grep -Po '...' input.txt`? It works to me, but I don't know if it does in OSX. – fedorqui Sep 02 '14 at 12:06
  • when I saw `osx` tag, I knew the sed/grep .. won't be same as gnu ones. but I don't think the grep by default support PCRE. – Kent Sep 02 '14 at 12:08
  • Yes, @fedorqui, and the response of the Terminal is the help of the grep command :\ – emenegro Sep 02 '14 at 12:09

3 Answers3

3

Lookaheads and lookbehinds are PCRE features so you have to use the parameter -P:

grep -Po '(?!").*(?=(" =))' input.txt
steffen
  • 16,138
  • 4
  • 42
  • 81
1

This should do:

awk -F\" '{print $2}' file

It uses " as separators, and then print second field.

Jotne
  • 40,548
  • 12
  • 51
  • 55
1

steffen`s answer is right, you have to use -P flag. But there is also a problem with your regex.

Imagine this input: "an arbitrary string" = " =this is the text one"

Your regex will fail dramatically. To solve this you have to use something like this:

grep -Po '^"\K.*?(?=(" =))'

  • ^ to prevent other matches that do not begin from the line start.
  • \K is just easier to read. (It also allows you to match strings with arbitrary length)
  • .*? to make it non-greedy.
Community
  • 1
  • 1
Aleks-Daniel Jakimenko-A.
  • 10,335
  • 3
  • 41
  • 39
  • Thank you so much for your answer but I use your regex in regexr.com with the example of my question and it does not work :\ – emenegro Sep 03 '14 at 06:12
  • @emenegro yes. I suppose that regexr.com is using javascript built-in regex engine, which does not support ``\K``. Indeed, a lot of regex features are working exclusively in perl, but ``\K`` works in ``grep`` too! It is up to you, you can use ``(?!^")`` instead of ``^"\K``, but both should work in ``grep -P``. – Aleks-Daniel Jakimenko-A. Sep 03 '14 at 15:21