Highest Voted 'crawler4j' Questions

0

votes

1 answer

Why does Crawler4j non blocking method is not waiting for links in queue?

Given this simple code: CrawlConfig config = new…

asked Feb 03 '16 at 18:58

Leszek Malinowski

111
2
7

0

votes

1 answer

update java swing component from different class

I am working on a crawler project using crawler4j and on top of it, I have a swing interface. I have 2 different cases, namely the controller.java (also containing the SWING components) and crawler.java. I am attempting to append output processed by…

java swing instances crawler4j

asked Dec 08 '15 at 11:57

kenAu89

101
1
11

0

votes

1 answer

Why does this env object keep growing in size ?

I have been working on a web crawler for some time now, the idea is simple, I have a SQL table containing a list of websites, I have many threads fetching the first website from the table and deleting it, then crawling it ( in a heap like…

java memory-leaks web-crawler heap-memory crawler4j

asked Jul 27 '15 at 23:25

Abderrahmane Boulgheraif

23
2
7

0

votes

0 answers

Multi-thread web crawling with Crawler4j: Missing pages

I am using multi-thread crawler Crawler4j to crawl some websites. This crawler allows the user to define the number of threads of the crawler to be run on a website. I decided to run the crawler up to depth/layer = 10 and crawl up to 501 pages per…

java web-crawler scrapy crawler4j

asked Jul 10 '15 at 18:09

Rushdi Shams

2,423
19
31

0

votes

1 answer

How to download text contained in JavaScript files via crawler4j?

I'm trying to use crawler4j to extract text from some websites. However, while I have changed the Filters to allow extensions with js in the following manner private final static Pattern FILTERS = Pattern.compile(".*(\\.(css|gif|jpg" +…

javascript web-crawler crawler4j

asked Jun 16 '15 at 00:23

aardwolf

69
2
6

0

votes

1 answer

Crawler4j downloading articles

I'm trying to download articles from news portals using Crawler4j. I would like to store them in folders under categories 'sport' 'science' 'health' or any other made by that portal. Url parsing isn't enough since some portals don't use categories…

web-crawler crawler4j categorization

asked Apr 21 '15 at 12:07

Chris Lup

3
2

0

votes

1 answer

Need clarification on shouldVisit and visit methods of Crawler4j

I need to download PDFs from websites using Crawler4j. I am following this documentation to create two classes: PDFCrawler PDFCrawlController Now, in my PDFCrawler class, I have a shouldVisit(Page page, WebURL url) method as follows: public…

java pdf web-crawler crawler4j

asked Apr 08 '15 at 18:45

Rushdi Shams

2,423
19
31

0

votes

1 answer

How to parse a document using crawler4j

I wanted to parse all the documents containing some text I enter as "query" using crawler4j in Eclipse. Any ideas?

search web web-crawler crawler4j

asked Mar 19 '15 at 17:51

Bruno Fernandes

427
2
6
14

0

votes

1 answer

How to collect contact information from websites?

Does anyone know a web crawler tool for collecting contact details from a website? Say I have a www.website/contact.. I want to pull out the address, phone number, etc.. There are 2 tools I've been looking at: cralwer4j opensource jar for java and…

web-scraping web-crawler scrapy google-crawlers crawler4j

asked Mar 19 '15 at 11:06

azi_santos

105
1
12

0

votes

2 answers

JavaDoc for Crawler4j

I recently came across crawler4j Api for WebCrawling in Java , but during developing my custom crawler I came to know that no javaDoc is present for this Do anybody knows is this API having JavaDoc and if yes then where it is ?

javadoc crawler4j

asked Mar 12 '15 at 14:30

Neeraj Jain

7,643
6
34
62

0

votes

1 answer

How to schedule crawler4j crawl control to run periodically?

I'm using crawler4j to build a simple web crawler. What I want to do is to invoke the crawl control every 10 minutes. I created a servlet that starts when my Tomcat server starts, and in the servlet I am using ScheduledExecutorService for the…

scheduledexecutorservice crawler4j

asked Feb 20 '15 at 19:20

rawPotato

33
6

0

votes

1 answer

Cannot Deploy Project involving Crawler4j

After I add the crawler4j jar file with the dependencies (I am not Maven) into the classpath library, I try deploying and running the project but my Glassfish 4.1 shows the following error; Severe: Exception during lifecycle…

java glassfish glassfish-4 crawler4j

asked Feb 18 '15 at 15:26

Sam Ebenezer

50
8

0

votes

2 answers

Can Crawler4j be run from another class

I need to call Crawler4j from a different class. Instead of the main method in the Controller class I used a simple method called setup. class Controller { public void setup(String seed) { try { String rootFolder = "data/crawler"; …

crawler4j

asked Jan 26 '15 at 02:22

Mallik Kumar

540
1
5
28

0

votes

1 answer

How to retrieve all the user comments from a site?

I want all the user comments from this site : http://www.consumercomplaints.in/?search=chevrolet The problem is the comments are just displayed partially, and to see the complete comment I have to click on the title above it, and this process has…

java comments excel-2013 crawler4j jericho-html-parser

asked Jan 07 '15 at 11:44

Parth Patel

9
4

0

votes

1 answer

Blocking Task on Java web application, and request timeout on Heroku server

I am new to Java web programming, I'm trying to make a web crawler, Using the Crawler4j sample code My problem is that when I submit the repost request, the Crawling task ( which is a blocking task) takes some time to get done, Heroku hosting has a…

java heroku web web-crawler crawler4j

asked Nov 13 '14 at 17:08

Abdou Abderrahmane

31
2
10

Questions tagged [crawler4j]