0

How to remove/strip all formatting or styling information from HTML table code?

I need to remove all coloring, font sizing etc. Probably completely remove all style and class attributes.

Probably I would like to just remove some tags and attributes. By removing tag I mean leaving it's content, but removing beginning and ending tag name.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Dims
  • 47,675
  • 117
  • 331
  • 600
  • Not only `style` attribute should be removed. Also `class` attribute probably. If some tags are enclosed with `span` tag with `class` or `style` attributes, then entire `span` tag should be removed. `table`, `td` and `tr` tags should not be removed – Dims Jan 23 '14 at 21:18

1 Answers1

0

I did something like this years ago in VB6. Copied below is the code. As you can see, the code just steps through the HTML character-by-character and removes everything between (and including) the < and > tags. Hopefully you can do something similar in whatever tool you are using.

Function CleanTags(HTML As String) As String
  Dim result As String, b As Boolean, c As String, i As Long
  b = False
  For i = 1 To Len(HTML)
    c = Mid(HTML, i, 1)
    If c = "<" Then b = True
    If b = False Then result = result & c
    If c = ">" Then b = False
  Next i
 CleanTags = result
End Function
mti2935
  • 11,465
  • 3
  • 29
  • 33
  • This will clean ALL tags. I need to clean only formatting tags – Dims Jan 23 '14 at 21:20
  • In that case, the only way I can think of for doing this would be to have a whitelist of allowed tags, or a blacklist of non-allowed tags. – mti2935 Jan 23 '14 at 21:30