5

I've written a Dancer web app that utilizes Net::OpenID::Consumer to consume OpenIDs for authentication. It works well with Google and MyOpenID, but not Yahoo. When a user tries to authenticate using their Yahoo account, HTML::Parser warns:

Parsing of undecoded UTF-8 will give garbage when decoding entities

and this warning kills my app (rightfully so).

I don't see any existing bugs with Net::OpenID::Consumer (or Common) that relate to this.
The HTTP headers and the HTML meta tags both specify UTF-8 for the 'claimed id' URI.
Why would the response not be decoded for HTML::Parser? Am I missing something obvious?

Here's the relevant code:

get '/openid_landing' => sub {
    my $params = params();
    my $csr = Net::OpenID::Consumer->new(
        ua => LWP::UserAgent->new(),
        consumer_secret => $secret,
        params => $params,
    );  
    my $id = $params->{'openid.claimed_id'};

    if (my $setup_url = $csr->user_setup_url) {
        redirect $setup_url;

    } elsif ($csr->user_cancel) {
        redirect uri_for('/');

    } elsif (my $vident = $csr->verified_identity) {
       # verified identity, log in or register user
       ...

    } else {
        die "Error validating identity: " . $csr->err;
    } 
};
Eitan T
  • 32,660
  • 14
  • 72
  • 109
kbosak
  • 2,132
  • 1
  • 13
  • 16
  • [Show your code](http://sscce.org) so that people may [reproduce the problem](http://www.chiark.greenend.org.uk/~sgtatham/bugs.html#showmehow). – daxim Jun 25 '12 at 16:13
  • 1
    Sounds like you didn't decode the HTML before passing it to Parser, so decode it. If this was LWP, I'd say use `->decoded_content` instead of `->content`. – Ωmega Jun 25 '12 at 16:25
  • 1
    user1215106, Net::OpenId::Common is grabbing and parsing the HTML, not my code. – kbosak Jun 25 '12 at 16:26

2 Answers2

1

The bug is in Net/OpenID/URIFetch.pm on lines 122-128 of version 1.14 (latest) It's using the raw content instead of the decoded content of the response object. Just remove the manual gzip decoding and use the decoded_content method in the response.

I haven't filed a bug report yet, feel free. :)

Here's a diff you can apply to fix it:

122c122
<         my $content = $res->decoded_content;
---
>         my $content = $res->content;
125a126,129
>         if ($res->content_encoding && $res->content_encoding eq 'gzip') {
>             $content = Compress::Zlib::memGunzip($content);
>         }
>
Uncle Arnie
  • 1,655
  • 1
  • 11
  • 13
0

It comes from the HTML::Parser module that is used by TreeBuilder under the hood, before you passing the page contents to TreeBuilder, feed them through decode_utf8:

use HTML::TreeBuilder;
use Encode;
my $contents = ...;
my $htree = HTML::TreeBuilder->new_from_content(decode_utf8 $contents);

For more :

http://metacpan.org/pod/HTML::TreeBuilder#new-from-content

http://search.cpan.org/dist/HTML-Parser/Parser.pm

szabgab
  • 6,202
  • 11
  • 50
  • 64
Sathishkumar
  • 3,394
  • 4
  • 20
  • 23