Highest Voted 'crawler4j' Questions

0

votes

1 answer

collect only relevant links from url

What I need is to collect the relevant links from the url. For example from a link like http://beechplane.wordpress.com/ , i need to collect the links that contains the actual articles. ie, links like…

asked Mar 17 '14 at 10:13

Dinoop Nair

2,663
6
31
51

0

votes

0 answers

Quartz scheduler + crawler4J http connection error

I'm trying to combine Quartz scheduler with crawler4j. The problem is that when I execute the C4J code in a main method it works well, but in the quartz Job execute() method, there is a Http connection error. We are working behind a proxy but it's…

java http proxy quartz-scheduler crawler4j

asked Feb 05 '14 at 13:25

strategesim

327
2
3
13

0

votes

3 answers

crawl https pages with crawler4j

For months now we used crawler4j to crawl a https site. Suddenly, since last friday, we're not able to crawl the very same https site. Has something changed in the https-protocol? The site is https://enot.publicprocurement.be/enot-war/home.do As a…

java ssl https crawler4j

asked Jan 28 '14 at 12:19

Heinz Uller

33
2
5

0

votes

1 answer

Crawler4j ImageCrawler String args

I´m trying to start the crawler4j example of: crawler4j When I start the ImageCrawlController I allready fail by the first step args.length < 3, because its 0. How can I make sure, that args is bigger then 3? public class ImageCrawlController { …

java web-crawler crawler4j

asked Nov 25 '13 at 19:33

csnewb

1,190
2
19
37

0

votes

1 answer

Calling Controller.Start in loop in Crawler4j?

I asked one question here. But this is kind of other question that sounds similar. Using crawler4j, I want to crawl multiple seed urls with restriction on domain name (that is domain name check in shouldVisit). Here is an example of how to do it.…

java web-crawler crawler4j

asked Nov 09 '13 at 12:09

akshayb

1,219
2
18
44

0

votes

1 answer

How to fix error "Failed to load Main-Class manifest from ..."

I download crawler4j on [https://code.google.com/p/crawler4j/downloads/detail?name=crawler4j-3.5.zip&can=2&q=]. I saved in my desktop. After I run crawler4j-3.5.jar, a error is displayed: "Failed to load Main-Class manifest from ..." How can I fix…

java manifest crawler4j

asked Jul 02 '13 at 16:07

MP3

13
2
7

0

votes

1 answer

What is a .lck file and why can't I read it with a buffered reader?

I'm trying to use crawler4j to crawl websites. I was able to follow the instructions on the crawler4j website. When it is done it creates a folder with two different .lck files, one .jdb file and one .info.0 file. I tried to read in the file using…

java parsing file-io web-crawler crawler4j

asked Mar 27 '13 at 13:32

j.jerrod.taylor

1,120
1
13
33

0

votes

2 answers

some information about pattern matching in a Java web crwaler using crawler4j library

I want implement a very simple web crawler using Java and I have find this library: crawler4j: http://code.google.com/p/crawler4j/ I need a crawler that do the following thing: Start from an URL (specificated by me) and recognizes if in the current…

java pattern-matching web-crawler crawler4j

asked Feb 21 '13 at 16:33

AndreaNobili

40,955
107
324
596

0

votes

1 answer

How to run crawler4j.jar with MyCrawler.java Controller.java files

I am new to crawlers and I want to run my first crawler program. I have three files Crawler4j.jar Mycrawler.java Controller.java when i enter javac -cp crawler4j-3.1.jar MyCrawler.java Controller.java at terminal i get following…

web-crawler crawler4j

asked Jan 19 '13 at 10:38

user1992664

1
1

0

votes

3 answers

Erroneous tree type in java

I am trying to run the following code for BasicCrawlController in java but I get some error: /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this…

java crawler4j

asked Oct 21 '12 at 09:44

orezvani

3,595
8
43
57

0

votes

2 answers

Accessing .lck and jdb files stored via web crawler

I'm currently using crawler4j as my web crawler of choice, and I am trying to teach myself how web crawlers work. I've started the crawl and I expected it to quickly return the crawled data at crawlStorageFolder (/data/crawl/root) seen below public…

java parsing web-crawler crawler4j

asked Sep 22 '12 at 21:53

Octavius

583
5
19

0

votes

3 answers

Determining parameters on crawler4j

I am trying to use crawler4j like it was shown to be used in this example and no matter how I define the number of crawlers or change the root folder I continue to get this error from the code stating: "Needed parameters: rootFolder (it will…

java html parsing web-crawler crawler4j

asked Sep 21 '12 at 00:17

Octavius

583
5
19

0

votes

1 answer

Java - Eclipse - The declared package "edu.uci.ics.crawler4j.examples.basic" does not match the expected package ""

I am trying to set up the example code for Crawler4j, but Eclipse is throwing an error that I don't understand. The error is: The declared package "edu.uci.ics.crawler4j.examples.basic" does not match the expected package "" The path…

java eclipse crawler4j

asked Sep 14 '12 at 14:01

Crayl

1,883
7
27
43

0

votes

2 answers

Selectively disable log4j debug log in Play console

I have a Play 2.0 app, ran play console from the command line. Somewhere in one of the libraries I use, it uses log4j and started to stream debug output for [crawler4j][1], I'm trying to figure out how to selectively disable that output in the play…

scala playframework log4j playframework-2.0 crawler4j

asked Jul 24 '12 at 20:16

Bob

8,424
17
72
110

0

votes

2 answers

Controlling the list of URL(s) to be crawled at runtime

In crawler4j we can override a function boolean shouldVisit(WebUrl url) and control whether that particular url should be allowed to be crawled by returning 'true' and 'false'. But can we add URL(s) at runtime ? if yes , what are ways to do that…

java web-crawler crawler4j

asked Jul 14 '12 at 09:32

user801154

Questions tagged [crawler4j]