Highest Voted 'crawler4j' Questions

0

votes

1 answer

convert a basic crawler4j to focussed crawler

I've implemented a basic crawler that retrieves data from seed Urls and is able to download the pages. further i am able to keep my crawler in the same seed website until the specified depth is achieved. How can I impose more restrictions on my…

web-crawler crawler4j

asked Nov 06 '14 at 05:12

Aditya Kohli

1

0

votes

0 answers

How to crawl latest articles in a specific domain using a specific set of websites?

I'm interested to build a program to get all latest articles in a specific domain ("computer science") from a specific set of websites ("ScienceDirect" for example). As you know, some websites publish a page for each research article, such as:…

web-crawler nutch crawler4j

asked Oct 07 '14 at 09:21

AmirHJ

827
1
11
21

0

votes

1 answer

Parse a page (partly generated by JavaScript) by using Selenium

I've got a problem: I want to parse a page (e.g. this one) to collect information about the offered apps and save these information into a database. Moreover I am using crawler4j for visiting every (available) page. But the problem - as I can see -…

java javascript selenium web-scraping crawler4j

asked Aug 28 '14 at 12:37

Hisushi

67
1
11

0

votes

1 answer

Check HTTP Status for jpg files using jsoup

I am getting http status codes for urls using jsoup as follows: Connection.Response response = null Document doc = Jsoup.connect(url).ignoreContentType(true).get() response = Jsoup.connect(url) …

http groovy jsoup crawler4j

asked Aug 27 '14 at 19:22

clever_bassi

2,392
2
24
43

0

votes

1 answer

Get seed of URL in crawler4j visit()

Hi how do I get the seed where it came from of the page in crawler4j's visit function? So far i have the url of the page but i cant figure out what was the seed that lead to there. public void visit(Page page) { String url =…

java url web-crawler crawler4j

asked Jul 17 '14 at 16:11

pinpox

179
2
10

0

votes

1 answer

why is crawler4j hanging randomly?

I've been using crawler4j for a few months now. I recently started noticing that it hangs on some of the sites to never return. The recommended solution is to set resumable to true. This is not an option for me as I am limited on space. I ran…

java crawler4j

asked Jul 17 '14 at 15:24

Salim

199
3
18

0

votes

1 answer

Get Http status using crawler4j & Jsoup

I am creating a Groovy & Grails app using MongoDB in the backend. I am using crawler4j for crawling and JSoup for parsing functionality. I need to get the http status of a URL and save it to database. I am trying the following: @Override void…

mongodb grails jsoup crawler4j

asked Jul 15 '14 at 20:11

clever_bassi

2,392
2
24
43

0

votes

1 answer

"Operation not allowed after ResultSet closed" with Datasource and crawler4j

After reading through a lot of similar questions I have not been able to get a solution that works for me. I have this methods: In a crawler4j Controller I do this: ArrayList urls = Urls.getURLs(100); for (String s : urls) { …

java mysql datasource connection-pooling crawler4j

asked Jul 11 '14 at 13:25

pinpox

179
2
10

0

votes

1 answer

Crawler4j not working for https urls

I am developing a grails app using crawler4j. I know this is an old question and I came across this solution here. I tried the solution provided but am not sure where to keep the another fetcher and mockssl java files. Also, I am not sure how…

grails https web-crawler crawler4j

asked Jul 09 '14 at 16:05

clever_bassi

2,392
2
24
43

0

votes

1 answer

Crawler4j calculate depth of a page

I am developing a web crawler using groovy & grails and mongodb Is there any way to calculate depth of a page using crawler4j? I know I can limit upto what depth I want to crawl but haven't come across anything that suggests how to calculate depth…

grails groovy depth crawler4j

asked Jun 26 '14 at 15:11

clever_bassi

2,392
2
24
43

0

votes

1 answer

Implementing Crawler4j with Selenium in Java doesn`t work

I'm trying to use Crawler4j simultaneous with Selenium for some Website testing. After a webpage is crawled, Selenium should start simultaneous a test with the parameters he got from the crawler. This would be the URL he should open or the Id's of…

java selenium automated-tests crawler4j

asked Jun 10 '14 at 11:40

juzwani

53
2
7

0

votes

1 answer

Crawler4j crawl jquery live content

I have a website but on its category page , product list generated after page loaded via javascript. And my crawler goes it and couldnt find any product. How can i solve that problem ? CrawlConfig config = new CrawlConfig(); …

java crawler4j

asked May 17 '14 at 16:46

Muhammet Arslan

975
1
9
33

0

votes

1 answer

Running crawler4j on multiple computers | different instances | Root Folder Lock

I'm trying to implement a crawler by using crawler4j. It's running fine until: I Run only 1 copy of it. I run it continuously without restart. If i restart the crawler, the url's collected are not unique. It is because, the crawler locks the root…

java crawler4j

asked May 11 '14 at 08:57

Lavneet

516
5
19

0

votes

1 answer

Crawler4j Stops Silently

In my application,I am using crawler4j. Though application is big, but I have even tested code with sample codes given here : https://code.google.com/p/crawler4j/source/browse/src/test/java/edu/uci/ics/crawler4j/examples/basic/ Problem is, it works…

java web-crawler crawler4j

asked May 01 '14 at 18:17

akshayb

1,219
2
18
44

0

votes

1 answer

Java The return type is incompatible with WebCrawler.visit(Page)

I'm using some crawler code from http://code.google.com/p/crawler4j/. Now, what I'm trying to do is to access every URLs found in the MyCrawler class from another class. I start the crawler with : // * Start the crawl. This is a blocking operation,…

java web-crawler crawler4j incompatibletypeerror

asked Mar 28 '14 at 21:02

PinkPanties

35
1
5

Questions tagged [crawler4j]