4

I have a lot of HTML files such as:

<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt">some text</P>
<TABLE class=MsoNormalTable style="BORDER-RIGHT: windowtext 1pt solid;" cellSpacing=0 cellPadding=0 width=568 border=1>
<TR style="HEIGHT: 12.75pt; mso-yfti-irow: 0; mso-yfti-firstrow: yes">
<TD style="BORDER-RIGHT: windowtext 1pt solid;" width=357 colSpan=2>text td</TD>
</TR>
</TABLE>

I need to remove all attributes and classes from it, so I get:

<P>some text</P>
<TABLE>
<TR>
<TD>text td</TD>
</TR>
</TABLE>

I've tried tidy utility with different options (drop-proprietary-attributes, word-2000) but can't get clean code.

Dimetry
  • 143
  • 8

1 Answers1

0

This remove all MS styles:

tidy --word-2000 true --bare true -o output.html input.htm 

I use "HTML Tidy for Linux version 5.1.25"

dlnsk
  • 114
  • 1
  • 3