Highest Voted 'crawler4j' Questions

1

vote

1 answer

Crawler4j - Getting exception java.lang.NoSuchMethodError

I am trying to setup crawler4j via eclipse(juno). When I run it, I am getting the below exception(even though the program keeps running without logging anything): "Exception in thread "main" java.lang.NoSuchMethodError: …

crawler4j

asked Nov 24 '12 at 20:16

neha

11
2

1

vote

2 answers

What does !FILTERs mean?

I have recently implemented Crawler4j and I am trying to teach myself the code by breaking it down line by line. I am having trouble understanding what the !FILTERS object on the line of code below means. @Override public boolean…

asked Sep 22 '12 at 06:51

Octavius

583
5
19

1

vote

2 answers

NoSuchMethodError in crawler4j CrawelController class

I am using example given here And included necessary files(crawler4j-3.3.zip &crawler4j-3.x-dependencies.zip) from [here] (http://code.google.com/p/crawler4j/downloads/list) in my build path and run path. I am getting this error: Exception in thread…

java berkeley-db-je crawler4j

asked Aug 28 '12 at 13:22

user801154

1

vote

1 answer

crawler4j to crawl a list of urls without crawling entire website

I have a list of web URLS need to be crawl. Is that possible to crawl only the list of webpage s with out crawling it deep. If i add the url as seed it crawls full website with full depth.

crawler4j

asked Aug 10 '12 at 12:38

Ramesh

2,295
5
35
64

1

vote

1 answer

How to extract all links on a page using crawler4j?

I am implementing a web crawler and I am using Crawler4j library. I am not getting all the links on a web site . I tried to extract all the links on one page using Crawler4j and missed some links. Crawler4j version : crawler4j-3.3 Url I used…

java html hyperlink web-crawler crawler4j

asked Jul 03 '12 at 08:51

user801154

1

vote

1 answer

Crawler4j gives null as parentURL and zero as parentDocID in url redirection

I am using the latest version of Crawler4j to crawl some feed URLs. I've passed some seed URLs along with the doc ID and I have also set the depth to zero as I only want the content of that page. The problem is that I am not able to get the…

java web-crawler crawler4j

asked Jul 02 '12 at 06:56

Pratik

51
3
10

1

vote

1 answer

Why would using hdfs:// prefix for a path to a file allow a file to be opened?

I'm writing a hadoop job that crawls pages. The library I am using uses the file system to store crawl data while it crawls. I was sure that the library would have to be modified to use the HDFS since a completely different set of classes need to be…

java hadoop crawler4j

asked Apr 05 '12 at 09:54

Raj

3,051
6
39
57

0

votes

1 answer

Scrape a Dynamic Website using Java with Selenium?

I'm trying to scrape https://www.rspca.org.uk/findapet#onSubmitSetHere to get a list of all pets for adoption. I've built web scrapers before using crawler4j but the websites were static. Since https://www.rspca.org.uk/findapet#onSubmitSetHere is…

java selenium web-crawler crawler4j

asked Feb 19 '22 at 00:02

breaktop

1,899
4
37
58

0

votes

1 answer

Feign client always throws a null pointer exception in a Spring boot/Crawler4j app

I am running a Crawler4j instance in a Spring boot application and my OpenFeign client is always null. public class MyCrawler extends WebCrawler { @Autowired HubClient hubClient; @Override public void visit(Page page) { // Lots of…

java spring crawler4j openfeign

asked Apr 12 '20 at 17:18

Nikolai Manek

980
6
16

0

votes

1 answer

Directing the search depths in Crawler4j Solr

I am trying to make the crawler "abort" searching a certain subdomain every time it doesn't find a relevant page after 3 consecutive tries. After extracting the title and the text of the page I start looking for the correct pages to submit to my…

java solr crawler4j

asked Feb 10 '20 at 02:51

ge0rgi0

59
1
9

0

votes

1 answer

crawler4j detects lines between the tag as text

This question already

hi crawler4j

Questions tagged [crawler4j]