2

I was researching on crawler4j. I found that it uses BerkeleyDB as the database. I am developing a Grails app using mongoDB and was wondering how flexible will crawler4j be to work within my application. I basically want to store the crawled information in the mongodb database. Is it possible to configure crawler4j in such a way that it used mongoDB as the default datastore rather than BerkeleyDB? Any suggestions would be helpful. Thanks

clever_bassi
  • 2,392
  • 2
  • 24
  • 43
  • FYI, this is probably a better question to research in the [crawler4j](https://code.google.com/p/crawler4j/) documentation and issue queue. It looks like crawler4j only supports BerkeleyDB at the moment as there is no obvious configuration for database details. – Stennie Jul 05 '14 at 13:58

1 Answers1

3

There is not configurable dao layer, but you can manipulate it.

There are 3 dao classes. Counters class saves total 'Scheduled' and 'Processed' pages counts (this is just for statistics). DocIDServer class holds url-id pairs for resolving new urls. Frontier class holds queue for pages to crawl. Just keep method logics and trancastion blocks.

omerfarukdemir
  • 158
  • 1
  • 10