Deleting multiple HTML lines in multiple files with Linux commandline

Question

I have 1000+ html files, all with more than 1000 lines, on a Linux server.
Most of the files have a particular part of html code that needs to be deleted.

The part that I need to deleted looks about this:

<div class="LoginOuterCssClass" id="ctl07">
    ...
</div>

Is there some script or command-line solution for this?

Commands like the following didn't help:

X,Ys/search/replace/g
1,2s/\([a-z]*\), \([a-z]*\)/\2 \1/ig
s/<[^]*>//g

Help would me much appreciated!

What command did you try for this. In example is some patterns - no real command — newman, Dec 14 '12 at 15:35
see this question for using *sed* and *grep* to delete one line of text from several files: http://stackoverflow.com/q/1182756/1284631 — user1284631, Dec 14 '12 at 15:36
What you're talking about is parsing HTML, and simple command line tools are not up to the task. What if there's a
inside of the
you want deleted, for example? What if the closing
isn't on a line by itself? You need a proper HTML parser. — Andy Lester, Dec 14 '12 at 16:49
I used the `find | xargs sed` command, there are 42 lines of HTML and several divs inside de div I want to delete. None of them on the same line. Andy, you talk about a proper HTML parser, what could I use? — Mike Madern, Dec 17 '12 at 08:17

dogbane · Accepted Answer · 2012-12-17T08:42:35.547

2

Try the following sed command on one file and see if it does what you want:

sed -n '/<div class="LoginOuterCssClass" id="ctl07">/{:a;N;/<\/div>/!ba;N;s/.*\n//};p' file.html

To run this on multiple files and edit them in-place, you run find and pass the files to sed via xargs as shown below:

find /some/path -name "*.html" -print0 | xargs -0 sed -i -n '/<div class="LoginOuterCssClass" id="ctl07">/{:a;N;/<\/div>/!ba;N;s/.*\n//};p'

edited Dec 17 '12 at 08:42

answered Dec 14 '12 at 15:44

dogbane

266,786
75
396
414

I tried `find /some/path -name "*.html" -print0 | xargs -0 sed -in '/
/{:a;N;/<\/div>/!ba;N;s/.*\n//};p'`, it works almost perfectly! Thank you very much! But te content in the file I tested now has every line double. Like `
` for example is now `

`. How does this could happen?
– Mike Madern Dec 17 '12 at 08:25
1

I have fixed the `sed` command. Should have been `-i -n`, not `-in`. – dogbane Dec 17 '12 at 08:43
You just made me very happy! :D Thank you very much! – Mike Madern Dec 17 '12 at 08:51

Deleting multiple HTML lines in multiple files with Linux commandline

1 Answers1