5

I have a bunch of html files and need to convert and format them to text with perl i.e somthing like <br/> will be interperted to \n

I found this perl module on cpan html::formattext it format the text well but if there is link it strip it , are there any option with HTML::FormatText to format the html as is to text but when there links like this

<a href="http://www.microsoft.com>http://www.microsoft.com</a>

i.e somthing like this :

<br /><b>Microsoft</b><br /><a href="http://www.microsoft.com>`

will be converted to:

microsoft
http://www.microsoft.com
Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129
smith
  • 3,232
  • 26
  • 55
  • 2
    I always use *lynx* to do this, because I’ve never found anything better. I would love to, though. – tchrist Jan 11 '12 at 20:29
  • 1
    If you already have `lynx` installed, there's [`HTML::FormatText::Lynx`](http://search.cpan.org/perldoc?HTML::FormatText::Lynx) – mob Jan 11 '12 at 21:02

1 Answers1

7

Take a look at HTML::FormatText::WithLinks

Setting the after_link option to, say, " (%l)" will put the link in line after the anchor text. In your example you would get Microsoft (http://www.microsoft.com).

Borodin
  • 126,100
  • 9
  • 70
  • 144
  • I saw this module it insert the links at the end as footnotes ,is it possible also to to keep the link in its position ? – smith Jan 11 '12 at 20:21
  • The second example in the Synopsis shows how to do it (the example has the URL in brackets, but it doesn't have to be). – theglauber Jan 11 '12 at 23:07
  • Yes you can put the link in line. I have modified my answer to explain. – Borodin Jan 12 '12 at 01:22