22

I'm looking for a simple way to HTML encode a string/object in Perl. The fewer additional packages used the better.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Phill Pafford
  • 83,471
  • 91
  • 263
  • 383

3 Answers3

34

HTML::Entities is your friend here.

use HTML::Entities;
my $encoded = encode_entities( "foo & bar & <baz>" );
cjm
  • 61,471
  • 9
  • 126
  • 175
friedo
  • 65,762
  • 16
  • 114
  • 184
31

When this question was first answered, HTML::Entities was the module most people probably used. It's pure Perl and by default will escape the HTML reserved characters ><'"& and wide characters.

Recently, HTML::Escape showed up. It has both XS and pure Perl. If you're using the XS version, it's about ten times faster than HTML::Entities. However, it only escapes ><'"& and has no way to change the defaults. Here's the difference with the XS version:

Benchmark: timing 10000 iterations of html_entities, html_escape...
html_entities: 14 wallclock secs (14.09 usr +  0.01 sys = 14.10 CPU) @ 709.22/s (n=10000)
html_escape:  1 wallclock secs ( 0.68 usr +  0.00 sys =  0.68 CPU) @ 14705.88/s (n=10000)

And here's the fair fight with pure Perl versions on each side:

Benchmark: timing 10000 iterations of html_entities, html_escape...
html_entities: 14 wallclock secs (13.79 usr +  0.01 sys = 13.80 CPU) @ 724.64/s (n=10000)
html_escape:  7 wallclock secs ( 7.57 usr +  0.01 sys =  7.58 CPU) @ 1319.26/s (n=10000)

You can get these benchmarks in Surveyor::Benchmark::HTMLEntities. I explain how I distribute benchmarks using Surveyor::App.

w.k
  • 8,218
  • 4
  • 32
  • 55
brian d foy
  • 129,424
  • 31
  • 207
  • 592
  • Given the fact that `HTML::Entities` looks for wide characters too, the pure Perl fight might not be that fair. It could be interesting to alter the code in the pure Perl version of `HTML::Escape` to include the same cases under its own algorithm and see that fight again. – Francisco Zarabozo Apr 01 '15 at 09:11
4

Which do you need to encode, a string or an object? If it's just a string, then you should just have to worry about encoding issues such as UTF-8, and CGI::escape will probably do the trick for you. If it's an object, you'll need to serialize it first, which opens up a whole new set of issues, but you might want to consider JSON-encoding it.

PS. Although since I can't find any recent documentation on this method (it's actually imported from CGI::Util and is marked as "internal"), you should probably use escapeHTML() as daxim points out in his comment: http://search.cpan.org/perldoc?CGI#AUTOESCAPING_HTML

Ether
  • 53,118
  • 13
  • 86
  • 159
  • 1
    The function is called `escapeHTML`. Proper deeplink: http://search.cpan.org/perldoc?CGI#AUTOESCAPING_HTML – daxim Jan 21 '10 at 13:36
  • @daxim: `CGI::escape` very much exists; it's actually defined in CGI::Util and imported into CGI proper. If you look at the source there are some subtle differences in implementation, which are sadly not well described in the documentation. – Ether Jan 21 '10 at 19:03
  • Alright. I'm not able to undo the vote because it is too old. – daxim Jan 21 '10 at 19:08
  • @daxim: I've edited the post so you get another crack at that vote :) – Ether Jan 21 '10 at 19:31