-1

I am trying to save a page which has Unicode Hindi fonts on it. I have tried several from this site and from other, but nothing seems to work.

The error that I am getting is related to wide characters.

here is the code I am using:

use CGI;
use strict;
use warnings;
use LWP::Simple;
use utf8;

use Encode qw(decode);

    $url = "http://www.vedakosh.com/rig-veda/mandal-1/sukta-001/mantra-rig-01-001-001";
    my $content = get $url or die "Couldn't get $url" unless defined $content;
    my $file="1.html";
    open FILE, ">:encoding(UTF-8)", "$file";
    print FILE $content;

Not sure what I am doing wrong, can someone please help or suggest something?

  • 1
    please provide the exact error string – pcantalupo Sep 02 '15 at 20:38
  • 2
    `my $content = get $url or die ... unless defined $content;` looks perilously close to a [`my $var = $x if $y`](http://stackoverflow.com/q/2161111) construction. `$content` can not be defined at that point in the program, so the postfix `unless` is unnecessary. – mob Sep 02 '15 at 20:54
  • 1
    This code shouldn't produce that error. It should fail to compile because of the undeclared `$url` (and probably the `$content` too). – Jim Davis Sep 02 '15 at 21:05
  • @pcantalupo the exact error is "wide character in print at filename..." – Arun Gupta Sep 02 '15 at 21:12
  • @JimDavis, the code is running fine the problem is.. its not getting me the content on the page properly. when I open the saved files, I get this ऋषि:  (Rishi) :- मधà¥à¤šà¥à¤›à¤¨à¥à¤¦à¤¾ instead i should get this ऋषि: (Rishi) :- मधुच्छन्दाः वैश्वामित्रः – Arun Gupta Sep 02 '15 at 21:14

2 Answers2

0

Using binmode on the file handle works for me:

use strict;
use warnings;

use CGI;
use LWP::Simple;

my $url = "http://www.vedakosh.com/rig-veda/mandal-1/sukta-001/mantra-rig-01-001-001";
my $content = get $url or die "Couldn't get $url";
my $file="1.html";

open my $fh, '>', $file
  or die $!;

binmode $fh, ':utf8';
binmode STDOUT, ':utf8';

print $fh $content;
print $content;
stevieb
  • 9,065
  • 3
  • 26
  • 36
  • solution you provided is still giving me the same results.. not the result that I wanted, Unicode Hindi.. I want the page to save as it is.. with Hindi fonts.. – Arun Gupta Sep 02 '15 at 21:32
0

Thanks everyone.. I found that the code is alright.. and doesnt require any encoding or stuff.. its getting the page right, the only issue is the source itself has wrong encoding mentioned in the header meta and i didnt notice that before..

But thanks everyone. I appreciate your support and suggestions..