Solr text search not working with long queries

Question

I'm not an expert in Solr and I'm trying it to check it's capabilities.

I'm having this odd behaviour where I'm getting good results if my text search query is composed of max 3 words, and zero results if the query is bigger.

What I did:

Created a docker with solr and core named my_core:

docker run -d -p 8983:8983 --name my_solr solr solr-precreate my_core
In the dashboard created a new Field, named campo_teste, with the type text_pt because I need to index a dataset of Portuguese texts.
Added and indexed my corpus with pysolr.
Now at query time, when I search for "subsídio de parentalidade" I get results that make sense:

But if I use a longer sentences I get zero results. This is an example with the same query as before but in the longer sentence "quando posso pedir o subsídio de parentalidade?":

Any ideas of what might be causing this issue?

MatsLindh · Accepted Answer · 2023-05-30T08:09:26.467

You're not searching in the same field for all your values; in the first example you're searching for subsídio in the campo_text field and de parentalidade in the default search field (since you didn't prefix those values with a field name).

In your second example you're searching for quando in the campo_text field and posso pedir o subsídio de parentalidade in the default search field (since you're not prefixing those values).

In effect, subsídio is present in campo_text, while quando is not - the default search field (by default _text_) probably has no content, so no hits are produced.

If you want to support general user queries, it's usually a better idea to use the edismax query handler with the qf (query fields) setting:

q=quanto posso pedir o subsídio de parentalidade&defType=edismax&qf=campo_text

This will search campo_text using all the words. You can then use q.op=AND or q.op=OR to adjust whether all words needs to be present or not, or you can use mm (minimum match) to adjust the profile in a more detailed way.

"_all the worse_" -> "_all the words_"? – andrewJames May 29 '23 at 21:35 — andrewJames, May 29 '23 at 21:35

Itchy_Analyzer · Answer 2 · 2023-05-30T10:50:09.310

The issue is due to the type of parsing Solr does. The answer from MatsLindh shares knowledge into how Solr searches for words in a field. If you want to search in a field for example campo_text the text:

I want a burguer.

Then the parsed query inside Solr should be

parsedquery: '`campo_text`: I `campo_text`:want `campo_text`:a `campo_text`:burguer'

(this type of query can be accessed when using the debug=all parameter)

On my end, I tried the solution provided by MatsLindh but noticed that using the defType = edismax turns the query to the following:

{'rawquerystring': 'I want a burguer',
  'querystring': 'I want a burguer',
  'parsedquery': '+(DisjunctionMaxQuery((text:i)) DisjunctionMaxQuery((text:want)) DisjunctionMaxQuery((text:a)) DisjunctionMaxQuery((text:burger)))',
  'parsedquery_toString': '+((text:i) (text:want) (text:a) (text:burguer))'}

My implementation is in Python and luckily there is a package named solrq which allows you to parse your text to the fields you want to search in using the Q Class. In my example I used Q(text = 'I want a burguer'). Debugging the same query I now get:

{'rawquerystring': 'text:I\\ want\\ a\\ burguer',
  'querystring': 'text:I\\ want\\ a\\ burguer',
  'parsedquery': 'text:i text:want text:a text:burguer',
  'parsedquery_toString': 'text:i text:want text:a text:burguer'}

I have tested both implementations of search queries (defType = 'edismax' and using the Q parser) on an experience I was working on where I'm looking at the accuracy of correct documents in the top k retrieved documents and I have obtained better results using the Q parser on my example:

	top_1	top_3	top_5	top_10
Q_parser_bm25	0.3054	0.4469	0.4988	0.5649
defType_edismax_bm25	0.2736	0.4009	0.4493	0.4988

Solr text search not working with long queries

2 Answers2