0

I use "OR" query in sphinx, now I want to get the source words of the results. For example, I search "apple | banana | pear" to get those docs where these fruit words exist. Usually, I will get a list of doc(id) which contain these words and sorted according to some strategy.

The question is that, for each doc returned(top 10 results), I also want to know what words are exactly contained in the doc. In other words, when I get doc#3 in the list, I also want to know that "apple" and "pear" exist in this doc. I use python API for sphinx now and is there any way to achieve this efficiently? In order to simplify the problem, I can use different numbers instead of the fruits.

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
GameBoy
  • 99
  • 1
  • 8
  • I think if I can get the content of a doc, the problem is easy to solve by calculating the intersection. I use xmlpipe2 source, is there any API that can be used to get a doc's content? – GameBoy Oct 29 '13 at 04:35
  • In short no. Sphinx index does not contain the original content. Its processed and indexed, but not stored. To the original text, you go back to the source, where your xmlpipe gets it from. – barryhunter Oct 29 '13 at 11:10
  • Having said that, there are attributes http://sphinxsearch.com/docs/current.html#attributes which ARE stored in the index. So can retrieve them. But all attributes must be held in memory, so its not usual to store the whole document text there. But if just talking about some short columns you might be ok. – barryhunter Oct 29 '13 at 11:11
  • I've tried to treat the content as an attribute, and I can get the content now. But the performance is very bad due to the mass of content. I wonder if I can get "apple" & "pear" for doc#3 directly when I use "OR" query like this. – GameBoy Oct 30 '13 at 08:11

0 Answers0