Although there have been quite some posts on these topic, my question is little bit specific. I need to parse few website and once done, I need to send some data to it. For example, say website A offers me a search tab, I need to programatically feed data to it. The resulting page might differ based on target site's updates. I want to code such a crawler. So which tools/language would be best to realize this? I am already well-versed in java and C, so anything based on these would be really helpful.
Asked
Active
Viewed 109 times
1 Answers
0
I would suggest using phantomjs. It's completely free and Windows, Linux, Mac are supported.
- It is very simple to install.
- It is very simple to execute using command line.
- Community is pretty big and solving straight-forward problems is trivial.
- It uses JavaScript as the scripting language so you'll be fine, I guess, with your Java background.
- You'll have to get familiar with DOM structure. Well, you cannot write a crawler without knowing it (even in case you select completely visual solution).
Everything depends on how frequently the crawler should be executed: PhantomJs is great for long-term jobs. Use something else, visual, like iMacros in case you're looking for one-time solution. It can be used inside Mozilla as an extension (free of charge) and there's a standalone version that costs money.
Cheers

Community
- 1
- 1

Andrey Petrenko
- 380
- 2
- 3
-
Thanks Andrey!.. I will try phantomjs for now.Looks interesting !! :) – Ajith Kamath Mar 28 '13 at 08:04