Highest Voted 'crawler4j' Questions

3

votes

2 answers

How to disable Crawler4J logger?

I am crawling using Crawler4J. I don't want to print log messages. But Crawler4J has a logger in it. How can I disable logger inner Crawler4J lib?

java crawler4j

asked May 02 '17 at 05:48

이현규

97
1
7

3

votes

2 answers

Crawler4j, Some urls are crawled without issue while others are not crawled at all

I have been playing around with Crawler4j and have successfully had it crawl some pages but have no success crawling others. For example I have gotten it to successfully crawl Reddi with this code: public class Controller { public static void…

java web-crawler google-crawlers crawler4j

asked Dec 13 '15 at 22:13

theGuy05

417
1
7
22

3

votes

3 answers

How to get scrape using crawler4j?

I've been going at this for 4 hours now, and I simply can't see what I'm doing wrong. I have two files: MyCrawler.java Controller.java MyCrawler.java import edu.uci.ics.crawler4j.crawler.Page; import…

java windows crawler4j

asked Nov 07 '14 at 11:14

rockstardev

13,479
39
164
296

3

votes

1 answer

Grails: Pass value from controller to thread

In my project, the action of my Grails controller is creating a new thread and calling a class form src/groovy folder each time this action is executed. I need to pass the value from this action to the new thread being created. How can I achieve…

multithreading grails groovy crawler4j

asked Aug 25 '14 at 17:40

clever_bassi

2,392
2
24
43

3

votes

1 answer

Params for WebCrawler in crawler4j

Is it possible to pass params to WebCrawler ? For example I want to pass new rule for WebCrawler.shouldVisit(WebURL url) method in runtime or set some field in my WebCrawler. Is it possible?

java crawler4j

asked Jul 04 '14 at 08:16

chinchilla

93
5

3

votes

1 answer

Set values from src/groovy classes to domain class properties

I'm working on crawler4j using groovy and grails. I have a BasicCrawler.groovy class in src/groovy and the domain class Crawler.groovy and a controller called CrawlerController.groovy. I have few properties in BasicCrawler.groovy class like url,…

grails groovy dns setting crawler4j

asked Jun 30 '14 at 14:49

clever_bassi

2,392
2
24
43

3

votes

1 answer

how to parse the html when using crawler4j

Recently,I had to crawl some website with open Source project crawler4j.However,crawler4j didn't offer any api for using.Now,i came to a problem that how i can parse a html with the function and class provided by crawler4j and find element like we…

java crawler4j

asked Sep 05 '13 at 14:18

mly

31
2

3

votes

2 answers

Replace all URLs in a HTML

I'm crawling some HTML files with crawler4j and I want to replace all links in those pages with custom links. Currently I can get the source HTML and a list of all outgoing links with this code: HtmlParseData htmlParseData = (HtmlParseData)…

java html web-crawler crawler4j

asked Jan 03 '13 at 11:45

Alireza Noori

14,961
30
95
179

3

votes

1 answer

Browse .jdb output?

I am running crawler4j and the output is to the directory /frontier/. The files in this directory are 00000000.jdb je.info.0 je.info.lck je.lck the .jdb file is the only one with data the other three files have zero bytes. I am not sure what to do…

java database jdb crawler4j

asked Apr 03 '12 at 22:48

KDEx

3,505
4
31
39

2

votes

2 answers

Efficient design of crawler4J to get data

I am trying to get the data from various websites.After searcing in stack overflow, am using crawler4j as many suggested this. Below is my understanding/design: 1. Get sitemap.xml from robots.txt. 2. If sitemap.xml is not available in robots.txt,…

parsing web-crawler crawler4j

asked Feb 25 '12 at 17:53

topblog

93
2
7

2

votes

1 answer

What sequence of steps does crawler4j follow to fetch data?

I'd like to learn, how crawler4j works? Does it fetch web page then download its content and extract it ? What about .db and .cvs file and its structures? Generally ,What sequences it follows? please, I want a descriptive content Thanks

java web-crawler crawler4j

asked Nov 17 '18 at 13:31

Ahmed Sakr

129
1
9

2

votes

2 answers

Web Crawler vs Html Parser

What is the difference between web crawler and parser? In java there are some name for fetching libraries . For example , they name nutch as a crawler and jsoup as a parser . Are they do the same purpose? Are they fully similar for the job? thanks

java web-crawler jsoup crawler4j

asked Nov 14 '18 at 16:40

Ahmed Sakr

129
1
9

2

votes

2 answers

Is it able to retrieve website content by Crawler4j?

I m very new to Java. Now , I want to retrieve the news article contents using Google news search -keyword: "toy" from page 1 to page…

java parsing web-crawler jsoup crawler4j

asked Sep 11 '16 at 20:44

evabb

405
3
21

2

votes

1 answer

crawler4j seems to be ignoring robots.txt file...How to fix it?

I am working on a project to crawl a small web directory and have implemented a crawler using crawler4j. I know that RobotstxtServer should be checking to see if a file is allow/disallowed by the robots.txt file, but mine is still showing a…

java web-crawler crawler4j

asked Mar 07 '16 at 19:53

drewfiss90

53
5

2

votes

1 answer

crawler4j asynchronously saving results to file

I'm evaluating crawler4j for ~1M crawls per day My scenario is this: I'm fetching the URL and parsing its description, keywords and title, now I would like to save each URL and its words into a single file I've seen how it's possible to save crawled…

java asynchronous web-scraping crawler4j

asked Feb 14 '16 at 14:47

Gideon

2,211
5
29
47

Questions tagged [crawler4j]