Questions tagged [domcrawler]

The DomCrawler is a Symfony component for PHP which eases DOM navigation for HTML and XML documents.

The DomCrawler component eases DOM navigation for HTML and XML documents and is part of the the Symfony PHP components.

The filter() function accepts the jQuery Selector Syntax and eases the selection of HTML tags and attributes.

179 questions

votes

1 answer

Goutte - Guzzle/DomCrawler - Scraping from HTML tables - Lots of complications

I started using Goutte to get info I need from sites, and its brilliant, its saving my serious amounts of time and hassle. But on the other hand, sometimes anomalies arise, and I have no idea what causes them. So heres a page I'm scraping from right…

asked Nov 07 '16 at 16:33

Horse O'Houlihan

1,659
4
14
29

votes

1 answer

Guzzle and DomCrawler

I'm using Guzzle and DomCrawler to scrape data from a webpage, everythings working well except for one issue. Its inserting weird characters into the data that I scrape. Heres an example: [2]=> array(4) { ["cell_lines"]=> string(4) "A549" …

php guzzle domcrawler

asked Nov 05 '16 at 12:49

Horse O'Houlihan

1,659
4
14
29

votes

2 answers

I have a 302 redirect pointing to www. but Googlebot keeps crawling non-www URLs

Do you know if it is possible to force the robots crawl on www.domaine.com and not domaine.com ? In my case, I have a web app that has enabled cached urls with prerender.io (to view the HTML code), but only on www. So, when the robots crawl on…

seo web-crawler google-crawlers domcrawler

asked Sep 21 '16 at 09:37

Stéphane R.

1,386
3
19
37

votes

1 answer

How to login on Amazon using Guzzle PHP

I'm trying to login on Amazon using Guzzle but I'm not having luck. Here's my code: $client = new \GuzzleHttp\Client(['cookies' => true]); $response = $client->request('POST', 'https://www.amazon.com/gp/sign-in.html', [ 'form_params' => [ …

php symfony web-scraping guzzle domcrawler

asked Jul 24 '16 at 21:41

Lincoln

votes

1 answer

Updating an array within an an anonymous function not working

I am trying to use a package called Goutte (php scraper/web-crawler) like this:

php arrays goutte domcrawler

asked Jul 16 '16 at 18:31

Latheesan

23,247
32
107
201

votes

1 answer

quickest most efficient way to generate a page hit

I am trying to crawl every page on my site (ran by a cron) to update data. There are roughly 500 pages. I have tried 2 options. PHP Simple HTML DOM Parser PHP get_headers Using either of the above, each page roughly takes 1.402 seconds to load. In…

php web-crawler domcrawler

asked Jun 23 '16 at 10:41

danyo

5,686
20
59
119

votes

2 answers

Node list is empty: button is glyphicon

A functional test with $form = $crawler->selectButton('input[type=submit]')->form(); fails with The current node list is empty Source code:

votes

1 answer

Symfony dom-crawler string in script tag convert to UTF8

I have this HTML content:

测试

When I use the Symfony's dom-crawler, the text is being HTML encoded. How can I prevent…

php symfony utf-8 domcrawler

asked Apr 09 '16 at 17:48

hooklife

votes

0 answers

Difference between Crawling and getiting links with Html Agility pack,

i am getting links of a website using Html Agility pack with console application c#, by giving the divs that i want and get the links from those divs, my question is the thing i am doing is crawling or parsing, if not then what is crawling

parsing web-crawler console-application html-agility-pack domcrawler

asked Mar 31 '16 at 04:20

Shah Rukh

votes

1 answer

Symfony + DomCrawler - how to extract data attributes from a

I'm using Symfony 2.8 & DomCrawler to parse a web site and I'm having a problem reading data attributes from a HTML entity. It might be as simple as a specific convention for data attributes, but I've not been able to find any references or examples…

symfony domcrawler

asked Feb 25 '16 at 07:33

LarryN

votes

2 answers

How to combine the text node of 2 pieces of extracted data using Goutte/Domcrawler

I've been trying to figure out how to combine two pieces of extracted text into a single result (array). In this case, the title and subtitle of a variety of books. Carrots Like Peas

php goutte domcrawler

asked Jan 28 '16 at 03:19

Fireflight

2,921
5
24
22

votes

1 answer

SymFony DomCrawler id*='text'

I'm trying to have DomCrawler select all DIVs that IDs contain "author-" I currently have $list = $crawler->filter('div[id*="actor-"]')->each(function (Crawler $node, $i) { return $node->text(); }); var_dump($list); But that doesn't return any…

symfony domcrawler

asked Jan 11 '16 at 11:45

user2077592

votes

0 answers

DomCrawler Select All Input Tags Within Form

I have a webpage I'm scraping form fields from (or trying at least). I'm using Symfony2 (and Goutte) to do this, so I have a $crawler object. Here's an example of the html below: ... other html stuff ...

symfony xpath domcrawler

asked Dec 30 '15 at 14:51

Kenny

2,124
3
33
63

votes

2 answers

symfony 2 domCrawler how to get all child elements of

Maybe this is a stupid question but I need to get an Object with all HTML nodes from a selected html Page. I have to make all nodes selectable, especially the opening tags. If anyone know the template Engine from TYPO3 TemplaVoila; I think this…

symfony domcrawler

asked Nov 27 '15 at 08:17

TheTom

votes

1 answer

Can't select link

I'm attempting to scrape the href of each .row. Ultimately, I'd like to click the link and access the DOM it links too, but I can't get either a Link object or the href attribute.. Not sure if the fact that the a attributes don't have any text in…

php symfony web-scraping css-selectors domcrawler

asked Nov 21 '15 at 04:41

Cesar Vega

Prev 1 2 3

…

12 Next