Crawler4j with mongoDB

Question

I was researching on crawler4j. I found that it uses BerkeleyDB as the database. I am developing a Grails app using mongoDB and was wondering how flexible will crawler4j be to work within my application. I basically want to store the crawled information in the mongodb database. Is it possible to configure crawler4j in such a way that it used mongoDB as the default datastore rather than BerkeleyDB? Any suggestions would be helpful. Thanks

FYI, this is probably a better question to research in the [crawler4j](https://code.google.com/p/crawler4j/) documentation and issue queue. It looks like crawler4j only supports BerkeleyDB at the moment as there is no obvious configuration for database details. — Stennie, Jul 05 '14 at 13:58

score 3 · Answer 1 · answered Aug 10 '14 at 14:29

There is not configurable dao layer, but you can manipulate it.

There are 3 dao classes. Counters class saves total 'Scheduled' and 'Processed' pages counts (this is just for statistics). DocIDServer class holds url-id pairs for resolving new urls. Frontier class holds queue for pages to crawl. Just keep method logics and trancastion blocks.

Crawler4j with mongoDB

1 Answers1