0

I want to get some information from a site with CP1251 encoding.

use Goutte\Client;
use Nonlux\Bundle\Entity\News;
....
protected function downloadQueuePage(){
    $cli = new Client();
    $url=array_pop($this->_url);
    $this->output->writeln("http://www.baikal-daily.ru" . $url);
    $cra=$cli->request("get", "http://www.baikal-daily.ru" . $url);
    $news=new News();
    $news->setSiteId(1);
    $news->setUrl($url);
    $news->setTitle($cra->filter("#content .main h3")->text());
 }

Default Crawler returns on some pages empty nodes h1, but it exist on the page and layout like is valid. After the magic of the code Groute, Crawler and iconv. In one case, I got:

В Улан-Удэ трёхлетний мальчик упал в открытый колодец
упал в открытый колодец
�й колодец
дец
�

a rather that:

В Улан-Удэ трёхлетний мальчик упал в открытый колодец

Another time I got a lot of beep signals from the console, which dumps the received pages. How can I solve this problem? Where to find the source of evil?

Spudley
  • 166,037
  • 39
  • 233
  • 307
nonlux
  • 774
  • 2
  • 6
  • 20

0 Answers0