3

I am trying to implement Haystack search for my website with Whoosh back-end. I have been able to successfully setup the haystack app and I can search the model that I have registered, but when I create the search_indexes.py file for another app, I am having the following issue:

I have two models: Member and Events. I create a search_indexes.py for both of them and the corresponding /search/... _text.txt files in the template folder. Then I use ./manage.py rebuild_index

I get the following message:

Indexing 8 events  
Indexing 5 members

However, I am not able to see 13 indexed items:

 $> ./manage.py shell    
 $> from haystack.query import SearchQuerySet   
 $> sqs = SearchQuerySet().all()  
 $> print sqs.count()  
 $> 8

And these are the 8 events that were indexed. Consequently, from the website, I can only search the events, not the members. Deleting the search_indexes.py file from the 'Event' app folder and redoing everything indexes the 5 members correctly and they can be searched as usual. What could be the reason for this?

Update: I included the search_indexes.py files in others apps also to see whether they are indexed properly. I get the following message on rebuilding the index:

Indexing 8 events.  
Indexing 4 guests.     
Indexing 5 members.    
Indexing 8 sponsors.    

Now, it is indexing all the events and members but none of the guests and sponsor. I am able to search for events and members but not for the other two (using both the SearchQuery API and the website)

Update: Issue seems to have been resolved by changing the source of haystack.backends.whoosh_backend. Please see the answers

Danny Beckett
  • 20,529
  • 24
  • 107
  • 134
Vikesh
  • 2,018
  • 6
  • 23
  • 33

2 Answers2

2

I've had the same problem the past couple of days (nice timing). I decided to start where you left off and see if I couldn't isolate the cause a bit better.

The narrowed results are (at least partially) generated by a query of the models which are registered to the site (L298 and on). For my code, the query it generates is...

django_ct:(barnaby.tag OR barnaby.userprofile)

...which gives a resultset with only barnaby.tag models. However, if I run...

django_ct:(barnaby.tag OR barnaby.userprofile) (username:pfrazee OR name:Tag114)

...I end up getting results from both tag and userprofile. I can only assume that's a problem with Whoosh, but I can't say for sure. We should probably contact Haystack and/or Whoosh about it.

At any rate, you can avoid this problem without altering haystack by setting this:

HAYSTACK_LIMIT_TO_REGISTERED_MODELS = False
Paul Frazee
  • 243
  • 1
  • 4
  • 1
    Thanks Paul, The last line is the answer. It works for me without altering the source of Haystack :). I believe it's not really a problem, it's just the way Haystack narrows it's results or something, when the last line is set to true. We can understand the issue better by reading the source codes of Whoosh and Haystack but I am in no mood of doing that. – Vikesh Apr 06 '11 at 04:08
1

Okay, so here's what I did to find out whether the problem is in Whoosh or Haystack. I opened the django shell and performed a search for the term that was not showing up in haystack SearchQuery API search:

./manage.py shell   
$>> import whoosh 
$>> from whoosh.query import *  
$>> from whoosh.index import open_dir  
$>> ix.schema  
<Schema: ['branch', 'category', 'coordinator', 'date_event', 'designation','details', 'django_ct', 'django_id'> 'name', 'organisation', 'overview','text', 'title']>
$>> ix = open_dir('/home/somedir/my_project/haystack/whoosh/')  
$>> searcher = ix.searcher()  
$>> res = ix.search(Term('text',u'pink'))  
$>> print res  
<Top 1 Results for Term('text', 'pink') runtime=0.000741004943848>
$>> print res['0']['name']  
u'Pink Floyd'   

So you see, Whoosh is correctly indexing all data. So, now I try the SearchQuery API

./manage.py shell
 $>> from haystack.query import SearchQuerySet
 $>> sqs = SearchQuerySet().filter(content='pink')
 $>> sqs
 $>> []

So, I realize that I must check out the whoosh_backend.py file of the haystack library to see what's happening. Open - haystack.backends.whoosh_backend around line number 345

'''Comment these two lines because the raw_results set becomes empty after the filter     call for some queries'''
if narrowed_results:
      raw_results.filter(narrowed_results)

to

#if narrowed_results:
      #raw_results.filter(narrowed_results)

And then it works. SearchQueryAPI returning exactly one result for the test query as expected. Web search working. Time for sweet sleep, though I would like to know what's the issue with haystack here.

Vikesh
  • 2,018
  • 6
  • 23
  • 33
  • I am not satisfied with this approach, mostly because I do not understand the reason of the problem and why it was solved by commenting those lines. I have already spent a lot of time debugging this error and I don't feel like understanding the complete architecture of Haystack. Insights will be much appreciated. For the time being, I am accepting this answer. – Vikesh Apr 05 '11 at 15:55