0

I have an HTML file with thousands of lines, but something is repeated.

CODE=12345-ABCDE-12345-ABCDE</div>...<!--This line goes on for hundreds of characters-->

Now, The line starts with "CODE=" every time, and the length of the code is the same every time. The following 28 characters are either letters, numbers, or dashes.

cat mysite.html | grep "CODE="

But I'd like a regex to display everything on the line BEFORE</div>

Thanks!

Simeon Visser
  • 118,920
  • 18
  • 185
  • 180
Goodies
  • 4,439
  • 3
  • 31
  • 57

2 Answers2

1

You can use cut instead:

cat myfile.html | cut -c 6-28

This shows the characters 6 - 28 of each line. This makes use of the fact that the length of CODE= is known as well as the length of the code that follows.

Simeon Visser
  • 118,920
  • 18
  • 185
  • 180
0

You can use sed also:

sed -rn 's@^(CODE=[A-Za-z0-9\-]{23})</div>.*@\1@p' file

Match any line staring with CODE= followed by 23 characters containing either letters, numbers, or dashes, followed by </div>

ray
  • 4,109
  • 1
  • 17
  • 12