32

I am working on a Django project, where I need to implement full text search. I have seen SOLR and found some good comments for the same. But as its implemented in Java and would need java enviroment to be installed on the system along with Python. Looking for the python equivalent for SOLR, I have seen Whoosh but I am not sure whether Whoosh is as efficient and strong as SOLR. Or shall I go with SOLR option only or are there any better options than Whoosh and SOLR with python?

Please suggest.

Thanks in advance

Ankit Jaiswal
  • 22,859
  • 5
  • 41
  • 64
  • 5
    Have a look at django-haystack. It provides an abstraction layer above solr, woosh, xapian and a couple other search engines. With haystack, you can start to experiment using woosh and later on switch to a faster and/or more capable engine without to much code changes – Benjamin Wohlwend Jul 12 '10 at 11:20

2 Answers2

16

Whoosh is actually very fast for a python-only implementation. That said, it's still at least an order of magnitude slower. Depending on the amount of data you need to index and search and the requirements on the maximum allowable latency and concurrent searches, it may not be an option.

SOLR is a bit of a complicated beast, but it's by far the most comprehensive search solution. Mix it with solrpy for stunning results. Yes, you will need java hosting.

You might also want to check out the python bindings for xapian. Xapian is very very fast, but less of a complete solution than SOLR. They are GPL licensed though, so that may/may not be viable for you.

drxzcl
  • 2,952
  • 1
  • 26
  • 28
  • 1
    yup,for me the concern is the performance and ease of implementation. – Ankit Jaiswal Jul 12 '10 at 10:40
  • 2
    If you can deploy native modules and have no problem with GPL code, I'd seriously evaluate xapian. It's fast and easy. SOLR is fast but not easy, Whoosh! is easy but not fast. – drxzcl Jul 12 '10 at 11:42
  • 4
    Whoosh, in 2014 is not that bad, it's actually quite fast when STORAGE='file' on an SSD, and lightening fast when STORAGE='ram'. Xapian seems not to work very well with haystack, had to quickly switch to Whoosh because overwhelming user complaints. – belteshazzar Jun 12 '14 at 13:11
  • 1
    Update 2016: Xapian works great, and on an SSD it's the fastest search I've used so far. – belteshazzar Oct 07 '16 at 12:32
3

I have used Lucene and Lucene extensions like SOLR and Nutch, and I found out that lucene pretty much satisfies what I need. I've only tried Whoosh once but chose Lucene because 1) I am using Java 2) I had trouble making UTF-8 work with Whoosh (not sure if it works out of the box now). In Lucene, I had no trouble working with Chinese characters.

If you're using Python as your Programming Language and Whoosh satisfies your needs then I'd suggest you use it over Java alternatives for better integration, avoid external dependencies, faster customization if you need to code additional functionalities.

UPDATE: If you're interested in using Lucene, it has a Python wrapper: See http://lucene.apache.org/pylucene/

Manny
  • 6,277
  • 3
  • 31
  • 45
  • 1
    thanks for your reply Manny. However, I am curious to know if there is something like Lucene in python as well? – Ankit Jaiswal Jul 12 '10 at 11:20
  • Yes, however it's not a port from Java to Python, but it's a Python wrapper for Lucene. See http://lucene.apache.org/pylucene/ – Manny Jul 12 '10 at 11:33
  • 2
    BTW I found it infinitely easier to have python talk to SOLR (using solrpy or the RESTful interface) than work directly with the lucene bindings. YMMV. – drxzcl Jul 12 '10 at 11:43