0

I'm very novice in php and I have a script to get information from some Persian music website.

I have problem with get Farsi character from website:

$this->fa_artist = $html->find('div.main-post', 0)->find('p', 0)->find('b', 1)->plaintext;
file_put_contents('fa_artist.txt', $this->fa_artist);

In save Farsi artist name in fa_artist by html link

name is: امیر علی

but I see this sequence:

امیرعلی

In the file

How can I save it as Farsi character?

Alessandro
  • 900
  • 12
  • 23
Alireza
  • 1
  • 3

1 Answers1

0

UTF-8 (unicode) link should be encoded with rawurlencode that serves the necessary sequences of characters in a standard compliance mode... for example:

<?php
  echo '<a href="' . rawurlencode("امیر علی") . '">' . htmlentities("امیر علی", ENT_QUOTES, "UTF-8") . '</a>';
?>

If you see the source you can see:

<a href="%D8%A7%D9%85%DB%8C%D8%B1%20%D8%B9%D9%84%DB%8C">امیر علی</a>

rawurlencode must be used for UTF-8 Link (http://php.net/manual/en/function.rawurlencode.php)

htmlentities must be used for UTF-8 Text (http://php.net/manual/en/function.htmlentities.php)

Your page must be served as UTF-8 by using this approach:

ini_set('default_charset', 'UTF-8');

put at the top of your script, and possibly the script must be internally encoded in UTF-8 without BOM (Byte Order Mark)...

So you can directly use UTF-8 in your project without loosing anything...

I hope this help.

Alessandro
  • 900
  • 12
  • 23