1

I'm using Perl to grab data from a SQLite database and the WWW:Mechanize module to do some web scraping.

The data (in the database) I'm posting has some characters in it, and after looking at the text on the website it has a couple odd characters: â¢, instead of the .

I have set the following at the top of my Perl program. I used this to prevent a warning in the terminal about "wide characters".

binmode(STDOUT, ":utf-8");

I don't really know much about encoding / decoding characters so any help would be useful.

Edit: After reading about Perl IO, I was able to find this stackoverflow answer which solved my problem.

Community
  • 1
  • 1
phpete
  • 318
  • 3
  • 12

2 Answers2

5

Decode inputs, encode outputs.

use open ':std', ':encoding(UTF-8)';  # Outputs are UTF-8
BEGIN { binmode STDIN; }              # ...but not the raw CGI request.

use CGI qw( -utf8 );                  # Decode parameters
use DBI qw( );

{
   my $cgi = CGI->new();
   print $cgi->header(
      -type    => "text/plain",  # Just cause it's shorter.
      -charset => "UTF-8",       # Tell browser encoding used.
   );

   my $dbh = DBI->connect(
      "dbi:SQLite:dbname=/tmp/tmp.sqlite", "", "",
      {
         AutoCommit     => 1,
         RaiseError     => 1,
         PrintError     => 0,
         PrintWarn      => 1,
         sqlite_unicode => 1,   # Encode and decode for us.
      },
   );

   $dbh->do("CREATE TABLE Testing ( str TEXT )");

   my $from_html_parser = "\x{2122}";

   # Should be 2122, since the trademark symbol is U+2122.
   printf("from_html_parser = %v04X\n", $from_html_parser);

   print("$from_html_parser\n");

   $dbh->do("INSERT INTO Testing VALUES (?)", undef, $from_html_parser);

   my $from_database = $dbh->selectrow_array("SELECT * FROM Testing");

   # Should be 2122, since the trademark symbol is U+2122.
   printf("from_database = %v04X\n", $from_database);

   print("$from_database\n");
}

END { unlink("/tmp/tmp.sqlite"); }
ikegami
  • 367,544
  • 15
  • 269
  • 518
0

These docs helped me: Perl IO

Then, with a couple Google searches I was able to find this stackoverflow answer which solved my problem.

Community
  • 1
  • 1
phpete
  • 318
  • 3
  • 12