I am building a small search engine using Django Haystack + Elasticsearch + Django REST Framework, and I'm trying to figure out reproduce the behavior of a Django QuerySet
's distinct
method.
My index looks something like this:
class ItemIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
item_id = indexes.IntegerField(faceted=True)
def prepare_item_id(self, obj):
return obj.item_id
What I'd like to be able to do is the following:
sqs = SearchQuerySet().filter(content=my_search_query).distinct('item_id')
However, Haystack's SearchQuerySet
doesn't have a distinct
method, so I'm kind of lost. I tried faceting the field, and then querying Django using the returned list of item_id
's, but this loses the performance of Elasticsearch, and also makes it impossible to use Elasticsearch's sorting features.
Any thoughts?
EDIT:
Example data:
Example data:
Item Model
==========
id title
1 'Item 1'
2 'Item 2'
3 'Item 3'
VendorItem Model << the table in question
================
id item_id vendor_id lat lon
1 1 1 38 -122
2 2 1 38.2 -121.8
3 3 2 37.9 -121.9
4 1 2 ... ...
5 2 2 ... ...
6 2 3 ... ...
As you can see, there are multiple VendorItem's for the same Item, however when searching I only want to retrieve at most one result for each item. Therefore I need the item_id
column to be unique/distinct.
I have tried faceting on the item_id
column, and then executing the following query:
facets = SearchQuerySet().filter(content=query).facet('item_id')
counts = sqs.facet_counts()
# ids will look like: [345, 892, 123, 34,...]
ids = [i[0] for i in counts['fields']['item_id']]
items = VendorItem.objects.filter(vendor__lat__gte=latMin,
vendor__lon__gte=lonMin, vendor__lat__lte=latMax,
vendor__lon__lte=lonMax, item_id__in=ids).distinct(
'item').select_related('vendor', 'item')
The main problem here is that results are limited to 100 items, and they cannot be sorted with haystack.