I have a very large database of 4.5M documents. When using the default query parser, the document I want to find appears in the results as it should.
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"\"I predict a riot\"",
"rows":"1"}},
"response":{
"numFound":15,"start":0,"docs":[
{
"artist":"Kaiser Chiefs",
"text":"<p>Oh, watchin' the people get lairy<br>It's not very pretty, I tell thee<br>Walkin' through town is quite scary<br>And not very sensible either<br>A friend of a friend he got beaten<br>He looked the wrong way at a policeman<br>Would never have happened to Smeaton<br>An old Leodiensian<br><br>I predict a riot, I predict a riot<br>I predict a riot, I predict a riot<br><br>Oh, I try to get to my taxi<br>A man in a tracksuit attacks me<br>He said that he saw it before me<br>Wants to get things a bit gory<br>Girls scrabble round with no clothes on<br>To borrow a pound for a condom<br>If it wasn't for chip fat, they'd be frozen<br>They're not very sensible<br><br>I predict a riot, I predict a riot<br>I predict a riot, I predict a riot<br><br>And if there's anybody left in here<br>That doesn't want to be out there<br><br>Ow!<br><br>Oh, watchin' the people get lairy<br>It's not very pretty, I tell thee<br>Walkin' through town is quite scary<br>Not very sensible<br><br>I predict a riot, I predict a riot<br>I predict a riot, I predict a riot<br><br>And if there's anybody left in here<br>That doesn't want to be out there<br><br>I predict a riot, I predict a riot<br>I predict a riot, I predict a riot</p>",
"_ts":6341730138387906561,
"title":"I predict a riot",
"id":"redacted"}]
}}
However, when I switch to the DisMax query handler using all the attached parameters, this is what I get:
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"q": "\"I predict a riot\"",
"defType": "dismax",
"ps": "0",
"qf": "text",
"echoParams": "all",
"pf": "text^5",
"wt": "json"
}
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
}
}
Nothing... If I remove the quotes, it finds some very irrelevant results (songs by an artist called "I"). In case it isn't clear "I predict a riot" is present inside the text field of this document. Several times even.
I'm a Solr newbie and I don't understand what is wrong with this query. I tried changing qf and pf to "artist text title" but nothing.
Ideally the goal is to find matches in all three fields, with a huge bonus if all words are found in the same order in the title, the artist or the text.. But even this simple test doesn't seem to work. :-/
Thanks!
Edit: With these params
"params": {
"q": "I predict a riot",
"defType": "dismax",
"qf": "text artist title",
"echoParams": "all",
"pf": "text^5",
"rows": "100",
"wt": "json"
}
which is giving me this debug query:
"debug": {
"rawquerystring": "I predict a riot",
"querystring": "I predict a riot",
"parsedquery": "(+(DisjunctionMaxQuery((text:I | title:I | artist:I)) DisjunctionMaxQuery((text:predict | title:predict | artist:predict)) DisjunctionMaxQuery((text:a | title:a | artist:a)) DisjunctionMaxQuery((text:riot | title:riot | artist:riot))) DisjunctionMaxQuery(((text:I predict a riot)^5.0)))/no_coord",
"parsedquery_toString": "+((text:I | title:I | artist:I) (text:predict | title:predict | artist:predict) (text:a | title:a | artist:a) (text:riot | title:riot | artist:riot)) ((text:I predict a riot)^5.0)",
"QParser": "DisMaxQParser",
"altquerystring": null,
"boostfuncs": null
}
I'm getting awful results, i.e. an artist called "I" - but not the kaiser chiefs song which has the query in the title and several times in the text.
Definitions:
<field name="title" type="string" indexed="true" stored="true"/>
<field name="artist" type="string" indexed="true" stored="true"/>
<field name="text" type="string" indexed="true" stored="true"/>