-4

Is it possible to scrap the web based on Keywords using Search engines in PHP?

Like when some put keyword, the script will search google and render the results and then render the pages and scrap/extract the line that includes the matched keywords?

Any idea or library to refer to?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194

2 Answers2

0

You can use php function call

file_get_contents('web url goes here');

example file_get_contents('http://www.google.com');

That function will get the html returned from the url, then you can use xpath to extract the element of html to get the data that you want.

You can see example and more explanation url below.

https://gist.github.com/anchetaWern/6150297

I personally have done something similar of your question, but it's in ruby on rails, you can explore the project here.

https://github.com/dvarun/gextract

the xpath that I used is here: https://github.com/dvarun/gextract/blob/master/app/jobs/fetch_keyword_job.rb

  • What if I don't have specific website to scrap, just want general data from any website through Google? –  Jun 29 '18 at 07:21
0

You can do that using google api https://developers.google.com/custom-search/json-api/v1/overview and a related php client https://github.com/google/google-api-php-client.

Later on you need to write a web scraper to download the websites (curl) and parse the html parser (i.e. https://github.com/paquettg/php-html-parser).

I would, however, not recommend php for the latter task. There are much more sophisticated scraping tools available for python (i.e. BeautifulSoup or Scrapy) that will make your life much MUCH easier than using php.

Simas Joneliunas
  • 2,890
  • 20
  • 28
  • 35
  • But those tool requires to provide a path or HTML tag to target specific element, but in my case I want to get the data regardless of the website name or domain, I just need it to find a page that has the keyword and extract the line that contain it. –  Jun 29 '18 at 07:27
  • If you need the line together with html, then you are correct, `strpos(..)` will work just fine. But if you only need the plaintext, you can also use the following xpath selector with your language of choice `xpath("//*[contains(text(), 'KEYWORD')]")` – Simas Joneliunas Jun 29 '18 at 08:18