0

Hello everyone, I have a folder full of html files which I want to convert to text files. I am working on ubuntu platform and unfortunately the lynx --dump is not installing for me. Is there an alternative way to convert the html files to text files? Please help! Thanks in advance.

Sanathana
  • 284
  • 4
  • 16

1 Answers1

0

This question is tagged python so my first choice would be Aaron Swartz's html2text. It produces test in markdown format.

Python solutions are also possible with BeautifulSoup.

If you like perl, here is a simple perl script to convert html to text:

#!/usr/bin/perl -w

use HTML::Parse;
use HTML::FormatText;

my $file = $ARGV[0];
if (not -r $file) {
    die "ERROR: File ($file) is not readable\n";
}

my $html = do { local $/; open(I,$file); <I> };
my $plain = HTML::FormatText->new->format(parse_html($html) );
print $plain;
John1024
  • 109,961
  • 14
  • 137
  • 171