I am working on scraping websites, I have tried many technologies to scrape websites.
First of all I used PHP cURL as a scraping tool, and went up to some extent to scrape websites, but then I faced a problem, that was; the PHP cURL couldn't scrape websites that used Ajax to load the website contents/data. And that's what stopped me scraping through PHP.
After a decent research I have found another solution to scrape websites, that were beyond the limitation of Ajax loaded websites etc, and was very powerful and cool to use, they were indeed Phantom JS and Casper JS. I have scraped lot of sites with it.
The problem I faced with these tools was that, these tools works/controlled through the command line interface, for example when you want to run the Phantom/Casper JS code, you need to run it through the command line. And this is my basic problem. What I need is, to write the code in Phantom/Casper JS and I want to have a webpage with admin panel, where I can control these scripts. Currently I am scraping career/jobs listings websites, and I want to automate these tools, to scrape these sites automatically after a given time, to stay updated with the employers sites, who posts new jobs.
For instance, I have code for each website separately and I manually execute each file through the command line and then wait for it to finish scraping and then I continue with second one and so on. What I want to have is, I write a script in JavaScript (preferably in Node JS - but not compulsory) which will execute the scraper code after a specific instance, and then will start scraping all of the websites in the background.
I can do the automation, its not a problem, but the problem is, I am unable to connect the Phantom/Casper JS with the website, even I tried Spooky JS which connects Phantom/Casper JS with Node JS, but unfortunately it doesn't work for me, and its alot messy.
Is there any other tool that's powerful like these two, and I can easily interact with them through a webpage ?