-1

I've been playing around with Rapidminer and can't quite seem to figure this out. I have a huge list of URLs listed in an excel file and would like to extract a single XPath element from each URL. Is there anyway I could do this with Rapidminer?

I've seen the tutorials Neil Mcguigan, but they seem to crawl the web/site in general rather than from a specific set of URLs.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Looks a pretty similar question to the one [here](http://stackoverflow.com/questions/9045024/can-rapidminer-extract-xpaths-from-a-list-of-urls-instead-of-first-saving-the-h). – Steven Maude Apr 29 '14 at 15:24

1 Answers1

0

You may want to look at Apache Nutch, Scrapy and similar web crawling tools.

You may just be looking at the wrong tools for this job: you want to scrape data from web sites, not do actual "data mining" (which is more of a heuristic statistical analysis).

Nutch (Java) and Scrapy (Python) are platforms for developing custom web crawlers and doing web scraping.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194