1

I use pcregrep to find patterns over multiple-lines in html code.

I try to match something that looks similar to this:

<some-html-stuff>
                    sometext<more-html-stuff>

The space between sometext and the beginning of the line equals exactly six tabs. Since the expression \s matches tabs, linebreaks and whitspaces I thought that

pcregrep -M -o -H "(?<=some-html-stuff\>[\s]{7})[A-Za-z0-9]*" /path/file.html

would do the job for me. but it does not (I added an additional one for the break). I actually tried this with several variations of the number but neither works.

What did I oversee?

edit:

the match should be sometext without any whitespaces before.

joaoal
  • 1,892
  • 4
  • 19
  • 29

2 Answers2

1

This regex will work for you:

(?<=some-html-stuff\>\n\s{7})([A-Za-z0-9]+)

You need to insert \n before using \s{7} to match 7 tabs OR else use \s{8} like this:

(?<=some-html-stuff\>)\s{8}([A-Za-z0-9]+)

since \s also matches \n.

RegEx Demo

Community
  • 1
  • 1
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • almost works. but i don't want the whitespaces not to be part of the string that is returned. thats why they need to be included in the lookbehind... – joaoal Aug 03 '14 at 10:14
  • You can use: `(?<=some-html-stuff\>\n\s{7})([A-Za-z0-9]+)` for that. – anubhava Aug 03 '14 at 10:15
  • it does not work. my complete code is `(?<=\"some-htmlstuff\"\>\n\s{6})([A-Za-z0-9\s\.]+)"` with only 6 in the brackts since there are only six tabs... I have tried this solution too but it does not work...does it work for you? – joaoal Aug 03 '14 at 10:40
  • Did you check the demo link? If it doesn't work for me I don't post an answer here. – anubhava Aug 03 '14 at 10:42
  • i did. this demo site is awesome! i will have to fix it inside my code but now I know that in theory it works.thanks! – joaoal Aug 03 '14 at 10:46
  • Yes this is indeed an awesome online regex tester. Glad it worked out for you. – anubhava Aug 03 '14 at 10:48
1

You could use \K instead of lookbehind,

pcregrep -M -o -H "<some-html-stuff\>\s*\K[A-Za-z0-9]+" /path/file.html

DEMO

OR

pcregrep -M -o -H "some-html-stuff\>\n\t{7}\K[A-Za-z0-9]+" /path/file.html

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274