I'm looking for a simple way to HTML encode a string/object in Perl. The fewer additional packages used the better.
-
2What exactly do you mean by "HTML encode"? Can you give an example input and the desired output? – cjm Jan 20 '10 at 21:33
-
1What character sets/locales do you have to handle? – pilcrow Jan 20 '10 at 21:36
3 Answers
HTML::Entities is your friend here.
use HTML::Entities;
my $encoded = encode_entities( "foo & bar & <baz>" );
When this question was first answered, HTML::Entities was the module most people probably used. It's pure Perl and by default will escape the HTML reserved characters ><'"&
and wide characters.
Recently, HTML::Escape showed up. It has both XS and pure Perl. If you're using the XS version, it's about ten times faster than HTML::Entities. However, it only escapes ><'"&
and has no way to change the defaults. Here's the difference with the XS version:
Benchmark: timing 10000 iterations of html_entities, html_escape...
html_entities: 14 wallclock secs (14.09 usr + 0.01 sys = 14.10 CPU) @ 709.22/s (n=10000)
html_escape: 1 wallclock secs ( 0.68 usr + 0.00 sys = 0.68 CPU) @ 14705.88/s (n=10000)
And here's the fair fight with pure Perl versions on each side:
Benchmark: timing 10000 iterations of html_entities, html_escape...
html_entities: 14 wallclock secs (13.79 usr + 0.01 sys = 13.80 CPU) @ 724.64/s (n=10000)
html_escape: 7 wallclock secs ( 7.57 usr + 0.01 sys = 7.58 CPU) @ 1319.26/s (n=10000)
You can get these benchmarks in Surveyor::Benchmark::HTMLEntities. I explain how I distribute benchmarks using Surveyor::App.

- 8,218
- 4
- 32
- 55

- 129,424
- 31
- 207
- 592
-
Given the fact that `HTML::Entities` looks for wide characters too, the pure Perl fight might not be that fair. It could be interesting to alter the code in the pure Perl version of `HTML::Escape` to include the same cases under its own algorithm and see that fight again. – Francisco Zarabozo Apr 01 '15 at 09:11
Which do you need to encode, a string or an object? If it's just a string, then you should just have to worry about encoding issues such as UTF-8, and CGI::escape will probably do the trick for you. If it's an object, you'll need to serialize it first, which opens up a whole new set of issues, but you might want to consider JSON-encoding it.
PS. Although since I can't find any recent documentation on this method (it's actually imported from CGI::Util and is marked as "internal"), you should probably use escapeHTML() as daxim points out in his comment: http://search.cpan.org/perldoc?CGI#AUTOESCAPING_HTML

- 53,118
- 13
- 86
- 159
-
1The function is called `escapeHTML`. Proper deeplink: http://search.cpan.org/perldoc?CGI#AUTOESCAPING_HTML – daxim Jan 21 '10 at 13:36
-
@daxim: `CGI::escape` very much exists; it's actually defined in CGI::Util and imported into CGI proper. If you look at the source there are some subtle differences in implementation, which are sadly not well described in the documentation. – Ether Jan 21 '10 at 19:03
-
-