Grep and Regex an HTML File

Question

I have an HTML file with thousands of lines, but something is repeated.

CODE=12345-ABCDE-12345-ABCDE</div>...<!--This line goes on for hundreds of characters-->

Now, The line starts with "CODE=" every time, and the length of the code is the same every time. The following 28 characters are either letters, numbers, or dashes.

cat mysite.html | grep "CODE="

But I'd like a regex to display everything on the line BEFORE</div>

Thanks!

score 1 · Answer 1 · answered Dec 21 '13 at 20:54

1

You can use cut instead:

cat myfile.html | cut -c 6-28

This shows the characters 6 - 28 of each line. This makes use of the fact that the length of CODE= is known as well as the length of the code that follows.

answered Dec 21 '13 at 20:54

Simeon Visser

118,920
18
185
180

Thanks for the tip! This worked like a charm: `cat mysite.html | grep "CODE=" | cut -c 6-29` – Goodies Dec 21 '13 at 20:56
2

@Goodies You don't need to use `cat` here. `grep "CODE=" mysite.html` is the same as `cat mysite.html | grep "CODE="`. – ChrisGPT was on strike Dec 21 '13 at 20:58

ray · Answer 2 · 2013-12-22T02:53:53.167

0

You can use sed also:

sed -rn 's@^(CODE=[A-Za-z0-9\-]{23})</div>.*@\1@p' file

Match any line staring with CODE= followed by 23 characters containing either letters, numbers, or dashes, followed by </div>

edited Dec 22 '13 at 02:53

answered Dec 22 '13 at 01:48

ray

4,109
1
17
12

Grep and Regex an HTML File

2 Answers2