Can Rapidminer extract an XPath value from a Specific list of URLs?

Question

I've been playing around with Rapidminer and can't quite seem to figure this out. I have a huge list of URLs listed in an excel file and would like to extract a single XPath element from each URL. Is there anyway I could do this with Rapidminer?

I've seen the tutorials Neil Mcguigan, but they seem to crawl the web/site in general rather than from a specific set of URLs.

Looks a pretty similar question to the one [here](http://stackoverflow.com/questions/9045024/can-rapidminer-extract-xpaths-from-a-list-of-urls-instead-of-first-saving-the-h). — Steven Maude, Apr 29 '14 at 15:24

score 0 · Answer 1 · answered Apr 29 '14 at 13:12

You may want to look at Apache Nutch, Scrapy and similar web crawling tools.

You may just be looking at the wrong tools for this job: you want to scrape data from web sites, not do actual "data mining" (which is more of a heuristic statistical analysis).

Nutch (Java) and Scrapy (Python) are platforms for developing custom web crawlers and doing web scraping.

Can Rapidminer extract an XPath value from a Specific list of URLs?

1 Answers1