0

I have 1000+ html files, all with more than 1000 lines, on a Linux server.
Most of the files have a particular part of html code that needs to be deleted.

The part that I need to deleted looks about this:

<div class="LoginOuterCssClass" id="ctl07">
    ...
</div>

Is there some script or command-line solution for this?

Commands like the following didn't help:

X,Ys/search/replace/g
1,2s/\([a-z]*\), \([a-z]*\)/\2 \1/ig
s/<[^]*>//g

Help would me much appreciated!

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
Mike Madern
  • 373
  • 2
  • 9
  • What command did you try for this. In example is some patterns - no real command – newman Dec 14 '12 at 15:35
  • see this question for using *sed* and *grep* to delete one line of text from several files: http://stackoverflow.com/q/1182756/1284631 – user1284631 Dec 14 '12 at 15:36
  • What you're talking about is parsing HTML, and simple command line tools are not up to the task. What if there's a
    inside of the
    you want deleted, for example? What if the closing
    isn't on a line by itself? You need a proper HTML parser.
    – Andy Lester Dec 14 '12 at 16:49
  • I used the `find | xargs sed` command, there are 42 lines of HTML and several divs inside de div I want to delete. None of them on the same line. Andy, you talk about a proper HTML parser, what could I use? – Mike Madern Dec 17 '12 at 08:17

1 Answers1

2

Try the following sed command on one file and see if it does what you want:

sed -n '/<div class="LoginOuterCssClass" id="ctl07">/{:a;N;/<\/div>/!ba;N;s/.*\n//};p' file.html

To run this on multiple files and edit them in-place, you run find and pass the files to sed via xargs as shown below:

find /some/path -name "*.html" -print0 | xargs -0 sed -i -n '/<div class="LoginOuterCssClass" id="ctl07">/{:a;N;/<\/div>/!ba;N;s/.*\n//};p'
dogbane
  • 266,786
  • 75
  • 396
  • 414
  • I tried `find /some/path -name "*.html" -print0 | xargs -0 sed -in '/
    /{:a;N;/<\/div>/!ba;N;s/.*\n//};p'`, it works almost perfectly! Thank you very much! But te content in the file I tested now has every line double. Like `
    ` for example is now `
    `. How does this could happen?
    – Mike Madern Dec 17 '12 at 08:25
  • 1
    I have fixed the `sed` command. Should have been `-i -n`, not `-in`. – dogbane Dec 17 '12 at 08:43
  • You just made me very happy! :D Thank you very much! – Mike Madern Dec 17 '12 at 08:51