0

Although there have been quite some posts on these topic, my question is little bit specific. I need to parse few website and once done, I need to send some data to it. For example, say website A offers me a search tab, I need to programatically feed data to it. The resulting page might differ based on target site's updates. I want to code such a crawler. So which tools/language would be best to realize this? I am already well-versed in java and C, so anything based on these would be really helpful.

Ajith Kamath
  • 51
  • 1
  • 6

1 Answers1

0

I would suggest using phantomjs. It's completely free and Windows, Linux, Mac are supported.

  • It is very simple to install.
  • It is very simple to execute using command line.
  • Community is pretty big and solving straight-forward problems is trivial.
  • It uses JavaScript as the scripting language so you'll be fine, I guess, with your Java background.
  • You'll have to get familiar with DOM structure. Well, you cannot write a crawler without knowing it (even in case you select completely visual solution).

Everything depends on how frequently the crawler should be executed: PhantomJs is great for long-term jobs. Use something else, visual, like iMacros in case you're looking for one-time solution. It can be used inside Mozilla as an extension (free of charge) and there's a standalone version that costs money.

Cheers

Community
  • 1
  • 1
Andrey Petrenko
  • 380
  • 2
  • 3