Questions tagged [webharvest]

Web-Harvest is Open Source Web Data Extraction tool written in Java.

Web-Harvest is Open Source Web Data Extraction tool written in Java.

It offers a way to collect desired Web pages and extract useful data from them. In order to do that, it leverages well established techniques and technologies for text/xml manipulation such as XSLT, XQuery and Regular Expressions. Web-Harvest mainly focuses on HTML/XML based web sites which still make vast majority of the Web content. On the other hand, it could be easily supplemented by custom Java libraries in order to augment its extraction capabilities.

71 questions
0
votes
1 answer

WebHarvest need 50 result in one request

I am new with this language and I am stuck in a simple task. Basically I would like to get 50 results instead of the basic 10 results that the searcher give me as a basic result. This will be the code:
sahithi
  • 25
  • 2
  • 7
0
votes
1 answer

WebHarvest XML not well formed

I am using WebHarvest to try to receive data from Woot.com and I'm getting a few different errors. I am able to get the website with the first process, but when I try to test xpath inside of the variable window I get the error…