Questions tagged [domcrawler]

The DomCrawler is a Symfony component for PHP which eases DOM navigation for HTML and XML documents.

The DomCrawler component eases DOM navigation for HTML and XML documents and is part of the the Symfony PHP components.

The filter() function accepts the jQuery Selector Syntax and eases the selection of HTML tags and attributes.

Documentation

179 questions
0
votes
1 answer

Goutte - Guzzle/DomCrawler - Scraping from HTML tables - Lots of complications

I started using Goutte to get info I need from sites, and its brilliant, its saving my serious amounts of time and hassle. But on the other hand, sometimes anomalies arise, and I have no idea what causes them. So heres a page I'm scraping from right…
Horse O'Houlihan
  • 1,659
  • 4
  • 14
  • 29
0
votes
1 answer

Guzzle and DomCrawler

I'm using Guzzle and DomCrawler to scrape data from a webpage, everythings working well except for one issue. Its inserting weird characters into the data that I scrape. Heres an example: [2]=> array(4) { ["cell_lines"]=> string(4) "A549" …
Horse O'Houlihan
  • 1,659
  • 4
  • 14
  • 29
0
votes
2 answers

I have a 302 redirect pointing to www. but Googlebot keeps crawling non-www URLs

Do you know if it is possible to force the robots crawl on www.domaine.com and not domaine.com ? In my case, I have a web app that has enabled cached urls with prerender.io (to view the HTML code), but only on www. So, when the robots crawl on…
Stéphane R.
  • 1,386
  • 3
  • 19
  • 37
0
votes
1 answer

How to login on Amazon using Guzzle PHP

I'm trying to login on Amazon using Guzzle but I'm not having luck. Here's my code: $client = new \GuzzleHttp\Client(['cookies' => true]); $response = $client->request('POST', 'https://www.amazon.com/gp/sign-in.html', [ 'form_params' => [ …
Lincoln
  • 880
  • 14
  • 25
0
votes
1 answer

Updating an array within an an anonymous function not working

I am trying to use a package called Goutte (php scraper/web-crawler) like this:
Latheesan
  • 23,247
  • 32
  • 107
  • 201
0
votes
1 answer

quickest most efficient way to generate a page hit

I am trying to crawl every page on my site (ran by a cron) to update data. There are roughly 500 pages. I have tried 2 options. PHP Simple HTML DOM Parser PHP get_headers Using either of the above, each page roughly takes 1.402 seconds to load. In…
danyo
  • 5,686
  • 20
  • 59
  • 119
0
votes
2 answers

Node list is empty: button is glyphicon

A functional test with $form = $crawler->selectButton('input[type=submit]')->form(); fails with The current node list is empty Source code:
0
votes
1 answer

Symfony dom-crawler string in script tag convert to UTF8

I have this HTML content:
测试
When I use the Symfony's dom-crawler, the text is being HTML encoded. How can I prevent…
hooklife
  • 13
  • 1
  • 3
0
votes
0 answers

Difference between Crawling and getiting links with Html Agility pack,

i am getting links of a website using Html Agility pack with console application c#, by giving the divs that i want and get the links from those divs, my question is the thing i am doing is crawling or parsing, if not then what is crawling
0
votes
1 answer

Symfony + DomCrawler - how to extract data attributes from a

I'm using Symfony 2.8 & DomCrawler to parse a web site and I'm having a problem reading data attributes from a HTML entity. It might be as simple as a specific convention for data attributes, but I've not been able to find any references or examples…
LarryN
  • 196
  • 1
  • 8
0
votes
2 answers

How to combine the text node of 2 pieces of extracted data using Goutte/Domcrawler

I've been trying to figure out how to combine two pieces of extracted text into a single result (array). In this case, the title and subtitle of a variety of books. Carrots Like Peas
Fireflight
  • 2,921
  • 5
  • 24
  • 22
0
votes
1 answer

SymFony DomCrawler id*='text'

I'm trying to have DomCrawler select all DIVs that IDs contain "author-" I currently have $list = $crawler->filter('div[id*="actor-"]')->each(function (Crawler $node, $i) { return $node->text(); }); var_dump($list); But that doesn't return any…
user2077592
0
votes
0 answers

DomCrawler Select All Input Tags Within Form

I have a webpage I'm scraping form fields from (or trying at least). I'm using Symfony2 (and Goutte) to do this, so I have a $crawler object. Here's an example of the html below: ... other html stuff ...
Kenny
  • 2,124
  • 3
  • 33
  • 63
0
votes
2 answers

symfony 2 domCrawler how to get all child elements of

Maybe this is a stupid question but I need to get an Object with all HTML nodes from a selected html Page. I have to make all nodes selectable, especially the opening tags. If anyone know the template Engine from TYPO3 TemplaVoila; I think this…
TheTom
  • 934
  • 2
  • 14
  • 40
0
votes
1 answer

Can't select link

I'm attempting to scrape the href of each .row. Ultimately, I'd like to click the link and access the DOM it links too, but I can't get either a Link object or the href attribute.. Not sure if the fact that the a attributes don't have any text in…
1 2 3
11
12