1

I'm not an expert in Solr and I'm trying it to check it's capabilities.

I'm having this odd behaviour where I'm getting good results if my text search query is composed of max 3 words, and zero results if the query is bigger.

What I did:

  1. Created a docker with solr and core named my_core:

    docker run -d -p 8983:8983 --name my_solr solr solr-precreate my_core

  2. In the dashboard created a new Field, named campo_teste, with the type text_pt because I need to index a dataset of Portuguese texts.

  3. Added and indexed my corpus with pysolr.

  4. Now at query time, when I search for "subsídio de parentalidade" I get results that make sense:

enter image description here

  1. But if I use a longer sentences I get zero results. This is an example with the same query as before but in the longer sentence "quando posso pedir o subsídio de parentalidade?":

enter image description here

Any ideas of what might be causing this issue?

Miguel
  • 2,738
  • 3
  • 35
  • 51

2 Answers2

2

You're not searching in the same field for all your values; in the first example you're searching for subsídio in the campo_text field and de parentalidade in the default search field (since you didn't prefix those values with a field name).

In your second example you're searching for quando in the campo_text field and posso pedir o subsídio de parentalidade in the default search field (since you're not prefixing those values).

In effect, subsídio is present in campo_text, while quando is not - the default search field (by default _text_) probably has no content, so no hits are produced.

If you want to support general user queries, it's usually a better idea to use the edismax query handler with the qf (query fields) setting:

q=quanto posso pedir o subsídio de parentalidade&defType=edismax&qf=campo_text

This will search campo_text using all the words. You can then use q.op=AND or q.op=OR to adjust whether all words needs to be present or not, or you can use mm (minimum match) to adjust the profile in a more detailed way.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
1

The issue is due to the type of parsing Solr does. The answer from MatsLindh shares knowledge into how Solr searches for words in a field. If you want to search in a field for example campo_text the text:

I want a burguer.

Then the parsed query inside Solr should be

parsedquery: '`campo_text`: I `campo_text`:want `campo_text`:a `campo_text`:burguer'

(this type of query can be accessed when using the debug=all parameter)

On my end, I tried the solution provided by MatsLindh but noticed that using the defType = edismax turns the query to the following:

{'rawquerystring': 'I want a burguer',
  'querystring': 'I want a burguer',
  'parsedquery': '+(DisjunctionMaxQuery((text:i)) DisjunctionMaxQuery((text:want)) DisjunctionMaxQuery((text:a)) DisjunctionMaxQuery((text:burger)))',
  'parsedquery_toString': '+((text:i) (text:want) (text:a) (text:burguer))'}

My implementation is in Python and luckily there is a package named solrq which allows you to parse your text to the fields you want to search in using the Q Class. In my example I used Q(text = 'I want a burguer'). Debugging the same query I now get:

{'rawquerystring': 'text:I\\ want\\ a\\ burguer',
  'querystring': 'text:I\\ want\\ a\\ burguer',
  'parsedquery': 'text:i text:want text:a text:burguer',
  'parsedquery_toString': 'text:i text:want text:a text:burguer'}

I have tested both implementations of search queries (defType = 'edismax' and using the Q parser) on an experience I was working on where I'm looking at the accuracy of correct documents in the top k retrieved documents and I have obtained better results using the Q parser on my example:

top_1 top_3 top_5 top_10
Q_parser_bm25 0.3054 0.4469 0.4988 0.5649
defType_edismax_bm25 0.2736 0.4009 0.4493 0.4988