Hello everyone, I have a folder full of html files which I want to convert to text files. I am working on ubuntu platform and unfortunately the lynx --dump is not installing for me. Is there an alternative way to convert the html files to text files? Please help! Thanks in advance.
Asked
Active
Viewed 1,072 times
1 Answers
0
This question is tagged python
so my first choice would be Aaron Swartz's html2text. It produces test in markdown format.
Python solutions are also possible with BeautifulSoup.
If you like perl
, here is a simple perl
script to convert html to text:
#!/usr/bin/perl -w
use HTML::Parse;
use HTML::FormatText;
my $file = $ARGV[0];
if (not -r $file) {
die "ERROR: File ($file) is not readable\n";
}
my $html = do { local $/; open(I,$file); <I> };
my $plain = HTML::FormatText->new->format(parse_html($html) );
print $plain;

John1024
- 109,961
- 14
- 137
- 171