I was researching on crawler4j. I found that it uses BerkeleyDB as the database. I am developing a Grails app using mongoDB and was wondering how flexible will crawler4j be to work within my application. I basically want to store the crawled information in the mongodb database. Is it possible to configure crawler4j in such a way that it used mongoDB as the default datastore rather than BerkeleyDB? Any suggestions would be helpful. Thanks
Asked
Active
Viewed 346 times
2
-
FYI, this is probably a better question to research in the [crawler4j](https://code.google.com/p/crawler4j/) documentation and issue queue. It looks like crawler4j only supports BerkeleyDB at the moment as there is no obvious configuration for database details. – Stennie Jul 05 '14 at 13:58
1 Answers
3
There is not configurable dao layer, but you can manipulate it.
There are 3 dao classes. Counters class saves total 'Scheduled' and 'Processed' pages counts (this is just for statistics). DocIDServer class holds url-id pairs for resolving new urls. Frontier class holds queue for pages to crawl. Just keep method logics and trancastion blocks.

omerfarukdemir
- 158
- 1
- 10