1

From few days I'm trying to implement some code to load some example content from another site to my site. I have problem with encoding - polish language. Source site is ISO-8859-2 and target in UTF-8. It's working in Chrome and Safari, not working in FF, Opera and IE. What am I doing wrong?

index.php

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Test_site</title>



<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.js"></script>
<script type="text/javascript">
    $("document").ready(function() {

        $("#content").load("curl.php #news_ajax");

    });
</script>


</head>
<body>

<h1>Test site</h1>
<div id="content"><img src="ajax-loader.gif" alt="Loading..." /></div>

</body>
</html>

curl.php

<?php
    $url = 'http://www.dominikanie.pl/';
    $htm = file_get_contents($url);
    $domain = "http://www.dominikanie.pl/";
    $htm = preg_replace("/(href|src)\=\"([^(http)])(\/)?/", "$1=\"$domain$2", $htm);
    $htm = mb_convert_encoding($htm, "ISO-8859-2",
          mb_detect_encoding($htm, "UTF-8, ISO-8859-2", true));
    echo $htm;

?>

I tried iconv but no result. Test site

2 Answers2

2
  • Web browser have nothing to do with file_get_contents.

  • Use CURL instead of file_get_content. Documentation here

  • Also dominikanie.pl (source) is in UTF-8, not ISO. This is why your encoding doesn't work.

  • You can try to send data as XML or jSon object when querying it via AJAX.

  • Use newer jQuery

  • iconv vs mb - I prefer iconv. Also my experience is that encoding detect not always work as it should. Especially when there is not much data to test or if there are some weird entities like MsWord special chars (like Polish "")

  • str_repleace sometimes have problems with Polish chars. Its rare, but i had some problems with it in the past. Also don't use htmlentities(). It really like to broke PL chars :]

Community
  • 1
  • 1
imclickingmaniac
  • 1,467
  • 1
  • 16
  • 28
1

Source site is ISO-8859-2 and target in UTF-8

So it should be

$htm = mb_convert_encoding($htm, "UTF-8",
      mb_detect_encoding($htm, "UTF-8, ISO-8859-2", true));
Amir
  • 4,089
  • 4
  • 16
  • 28