9

I'm working on a project that involves converting a large amount of HTML content to plain/text. I have a custom-written module that does the job OK, but I'm wondering if there's some standard tools to help get the job done.

Brian Tol
  • 4,149
  • 6
  • 24
  • 27

2 Answers2

10

Html2Text seems to be a good option

Chris Ballance
  • 33,810
  • 26
  • 104
  • 151
4

Here's a python library which does HTML parsing:

BeautifulSoup is another option.

tcarobruce
  • 3,773
  • 21
  • 33
  • 2
    To save others some time circling from Google back to SO, here is a Q&A describing that Beautiful Soup is not really maintained anymore: [WebScraping with BeautifulSoup or LXML.HTML](http://stackoverflow.com/questions/5493514/webscraping-with-beautifulsoup-or-lxml-html). – sage Jul 14 '11 at 14:59
  • 1
    Beautiful Soup appears to be maintained now I think. – contrebis Nov 29 '12 at 15:49