2

So I have just begun learning regular expressions. I have to extract a substring within a large string.

My string is basically one huge line containing a lot of stuff. I have identified the pattern based on which I need to extract. I need the number in this line A lot of stuff<li>65,435 views</li>a lot of stuff This number is just for example.

This entire string is in fact one big line and my file views.txt contains a lot of such lines.

So I tried this,

while read p
do
y=`expr "$p": ".*<li>\(.*\) views "`
echo $y
done < views.txt

I wished to iterate over all such lines within this views.txt file and print out the numbers.

And I get a syntax error. I really have no idea what is going wrong here. I believe that I have correctly flanked the number by <li> and views including the spaces.

My (limited) interpretation of the above regex leads me to believe that it would output the number.

Any help is appreciated.

tofu
  • 125
  • 1
  • 3
  • 11

1 Answers1

5

The syntax error is because the ":" is not separated from "$p" by a space (or tab). With that fixed, the regex has a trailing blank which will prevent it matching. Fixing those two problems, your sample script works as intended.

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
  • 1.) Why do you need space to put : after $p 2.) I thought that since there is a lot of stuff after views i need to put a space for the regex to match. So what is interpreted by that trailing space? – tofu Feb 15 '15 at 02:15
  • 1
    (1) the space is needed so that expr knows it is not part of the string, and (2) the trailing space would match a space after "views" -- but your example had none. – Thomas Dickey Feb 15 '15 at 02:19