0

I have django project with django-haystack and xapian backend. I have created an index:

sugar=indexes.DecimalField(null=True)

Now I'm trying to filter the SearchQuery with different arithmetical operators, and I'm getting strange results (trying lt, lte, gt and gte with filter like: sqs.filter(sugar__lt=60)). I'm then filtering out None's, sorting the results and converting to float to get a better view on it.

sqs.count()=130, len(sqs_lt)=114, len(sqs_gt)=130, len(sqs_lte)=0, len(sqs_gte)=16
vals_lt=[0.1, 0.1, 0.3, 0.3, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.7, 0.8, 0.8, 0.8, 0.8, 1.0, 1.1, 1.1, 1.1, 1.1, 1.1, 1.3, 1.33, 1.4, 1.5, 1.8, 1.8, 2.0, 2.0, 2.0, 2.0, 2.1, 2.2, 2.2, 2.4, 2.5, 2.5, 2.78, 2.8, 2.9, 3.0, 3.07, 3.2, 3.2, 3.4, 3.6, 3.9, 4.0, 4.9, 6.0, 6.0, 6.2, 6.3, 6.3, 6.6, 12.0, 12.0, 12.4, 12.5, 13.7, 14.0, 18.0, 20.0, 22.0, 22.0, 23.0, 24.1, 24.8, 26.0, 26.0, 26.5, 27.7, 30.96, 34.0, 39.0, 40.0, 40.0, 41.0, 42.2, 42.5, 43.0, 46.0, 48.0, 51.1, 52.0, 54.0, 100.0]
vals_gt=[0.1, 0.1, 0.3, 0.3, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.7, 0.8, 0.8, 0.8, 0.8, 1.0, 1.1, 1.1, 1.1, 1.1, 1.1, 1.3, 1.33, 1.4, 1.5, 1.8, 1.8, 2.0, 2.0, 2.0, 2.0, 2.1, 2.2, 2.2, 2.4, 2.5, 2.5, 2.78, 2.8, 2.9, 3.0, 3.07, 3.2, 3.2, 3.4, 3.6, 3.9, 4.0, 4.9, 6.0, 6.0, 6.2, 6.3, 6.3, 6.6, 7.25, 7.5, 8.2, 8.5, 8.5, 9.3, 12.0, 12.0, 12.4, 12.5, 13.7, 14.0, 18.0, 20.0, 22.0, 22.0, 23.0, 24.1, 24.8, 26.0, 26.0, 26.5, 27.7, 30.96, 34.0, 39.0, 40.0, 40.0, 41.0, 42.2, 42.5, 43.0, 46.0, 48.0, 51.1, 52.0, 54.0, 60.0, 60.0, 63.0, 67.0, 68.0, 70.0, 71.0, 72.0, 72.0, 90.0, 100.0]
vals_lte=[]
vals_gte=[7.25, 7.5, 8.2, 8.5, 8.5, 9.3, 60.0, 60.0, 63.0, 67.0, 68.0, 70.0, 71.0, 72.0, 72.0, 90.0]

I suspect that __lt works almost correctly (except for inexplicable 100), but __gte looks more like lexicographical comparison. What am I missing?

aikipooh
  • 137
  • 1
  • 19
  • Normally filtering with `sugar__lt=60` will filter out nones. – Willem Van Onsem Jun 24 '23 at 15:26
  • perhaps a decimal field uses text as backend, and thus somehow thus seems to filter lexicographically, that would explain 100, since `60 < 100`, but `'100' < '60'`. – Willem Van Onsem Jun 24 '23 at 15:27
  • I have also price set as the same Decimal index. That's how oscar shop module has it, so I was relying on it. But maybe it doesn't work as expected too:). Trying to change index to float and play with it now. And I think __lt has retained None's from what I remember. – aikipooh Jun 24 '23 at 15:34
  • but it does not matter what `index` field you use, what matters is what the database uses. Plugging in the wrong one at the indexing model will just result in bad conversions, and less expressive ORM, but eventually what counts is the type that the database itself uses. – Willem Van Onsem Jun 24 '23 at 15:39
  • Yes, price behaves the same, __gte comparing lexicographically. Anyway, you're hinting that xapian backend incorrectly handles Decimal index? – aikipooh Jun 24 '23 at 15:45
  • Aha, found an old question, probably it's still like that: https://github.com/notanumber/xapian-haystack/issues/84 – aikipooh Jun 24 '23 at 15:50
  • what index you use has not (much) effect on the backend. If you define a `DecimalField` as index, that is just for Django how to construct queries. If the backend behind it works with different types, then the query Django forms might be interpreted in the wrong way. – Willem Van Onsem Jun 24 '23 at 16:00

1 Answers1

0

I've taken a deeper look. With the current implementation in xapian_backend, Decimal will be converted to string, hence the lexicographical comparison. I don't know how often people use xapian backend (I've chosen it to have something fast to test MVP), but django-oscar's price in the catalogue.product is Decimal as well, and they're optimistic having it as a DecimalIndex for haystack. For the prices under 10 coins it will mostly work though:)

I'll follow up with gory details of float in the same setup.

aikipooh
  • 137
  • 1
  • 19