0

Is there any limit of documents while clustering through carrot2 plugin with elastic search.

That is like we can cluster at max 10000 documents using carrot2 is there any limitation like this?

Prashant
  • 17
  • 1
  • 4

1 Answers1

0

Carrot2 was designed to cluster small-to-medium collections of documents in real time. The typical range is a few hundreds of documents. The reasonable maximum for the Lingo algorithm is about 1k documents, the STC algorithm should be able to handle a maximum of around 10k documents. If you'd like to go beyond that, you may also want to check the commercial Lingo3G algorithm that plugs into Carrot2.

Having said that, when clustering search results, the search engine will first need to fetch the contents of all the documents to be clustered, which may take some significant time too.

Stanislaw Osinski
  • 1,231
  • 1
  • 7
  • 9