0

I have following problem: I search for term and get results. Everything is fine. If a term exists as a hyphenated word in solr index, then the result containing this word will always get a higher score/ will be shown on the top of results.

I have already tried to change the third result entry of my search and changed not hyphenated wort to a hyphenated one. And after reindexing a document and searching for the same term I would expect the same scoring like before. But the document where I changed the word is now on the first place.

Text fieldtype looks following in my schema.xml:

 <fieldType name="text" class="solr.TextField" sortMissingLast="true"  positionIncrementGap="100">
   <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory" />
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords-de.txt" />
      <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" splitOnCaseChange="0" splitOnNumerics="0" catenateWords="1" catenateNumbers="0" catenateAll="1" stemEnglishPossessive="1" preserveOriginal="1" />
      <filter class="solr.GermanNormalizationFilterFactory" />
      <filter class="solr.LowerCaseFilterFactory" />
      <filter class="solr.WordDelimiterFilterFactory" catenateAll="1" preserveOriginal="1" />
   </analyzer>
 </fieldType>

Does anyone know why it causes different results? Would appreciate any help.

UPDATE: I executed the search query for "Meyer" before the word was hyphenated. I got the following result:

<lst name="debug">
  <str name="rawquerystring">Meyer</str>
  <str name="querystring">Meyer</str>
  <str name="parsedquery">(+DisjunctionMaxQuery((content:meyer | title:meyer | keywords:meyer | h1:meyer | description:meyer | browsertitle:meyer^3)))/no_coord</str>
  <str name="parsedquery_toString">+(content:meyer | title:meyer | keywords:meyer | h1:meyer | description:meyer | browsertitle:meyer^3)</str>
  <lst name="explain">
    <str name="ID1">
2.1717649 = max of:
  0.471918 = weight(content:meyer in 26) [DefaultSimilarity], result of:
    0.471918 = score(doc=26,freq=4.0), product of:
      0.32961872 = queryWeight, product of:
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.057556875 = queryNorm
      1.4317087 = fieldWeight in 26, product of:
        2.0 = tf(freq=4.0), with freq of:
          4.0 = termFreq=4.0
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.125 = fieldNorm(doc=26)
  0.9652289 = weight(title:meyer in 26) [DefaultSimilarity], result of:
    0.9652289 = score(doc=26,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 26, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=26)
  0.9652289 = weight(description:meyer in 26) [DefaultSimilarity], result of:
    0.9652289 = score(doc=26,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 26, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=26)
  2.1717649 = weight(browserTitle:meyer^3.0 in 26) [DefaultSimilarity], result of:
    2.1717649 = fieldWeight in 26, product of:
      1.0 = tf(freq=1.0), with freq of:
        1.0 = termFreq=1.0
      5.7913733 = idf(docFreq=14, maxDocs=1807)
      0.375 = fieldNorm(doc=26)
</str>
    <str name="ID2">
2.1717649 = max of:
  0.471918 = weight(content:meyer in 222) [DefaultSimilarity], result of:
    0.471918 = score(doc=222,freq=4.0), product of:
      0.32961872 = queryWeight, product of:
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.057556875 = queryNorm
      1.4317087 = fieldWeight in 222, product of:
        2.0 = tf(freq=4.0), with freq of:
          4.0 = termFreq=4.0
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.125 = fieldNorm(doc=222)
  0.9652289 = weight(title:meyer in 222) [DefaultSimilarity], result of:
    0.9652289 = score(doc=222,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 222, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=222)
  0.9652289 = weight(description:meyer in 222) [DefaultSimilarity], result of:
    0.9652289 = score(doc=222,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 222, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=222)
  2.1717649 = weight(browserTitle:meyer^3.0 in 222) [DefaultSimilarity], result of:
    2.1717649 = fieldWeight in 222, product of:
      1.0 = tf(freq=1.0), with freq of:
        1.0 = termFreq=1.0
      5.7913733 = idf(docFreq=14, maxDocs=1807)
      0.375 = fieldNorm(doc=222)
</str>
    <str name="ID3">
2.1717649 = max of:
  0.471918 = weight(content:meyer in 234) [DefaultSimilarity], result of:
    0.471918 = score(doc=234,freq=4.0), product of:
      0.32961872 = queryWeight, product of:
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.057556875 = queryNorm
      1.4317087 = fieldWeight in 234, product of:
        2.0 = tf(freq=4.0), with freq of:
          4.0 = termFreq=4.0
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.125 = fieldNorm(doc=234)
  0.9652289 = weight(title:meyer in 234) [DefaultSimilarity], result of:
    0.9652289 = score(doc=234,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 234, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=234)
  0.9652289 = weight(description:meyer in 234) [DefaultSimilarity], result of:
    0.9652289 = score(doc=234,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 234, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=234)
  2.1717649 = weight(browserTitle:meyer^3.0 in 234) [DefaultSimilarity], result of:
    2.1717649 = fieldWeight in 234, product of:
      1.0 = tf(freq=1.0), with freq of:
        1.0 = termFreq=1.0
      5.7913733 = idf(docFreq=14, maxDocs=1807)
      0.375 = fieldNorm(doc=234)
</str>
</lst>

Then I changed the 3rd Result from "Meyer" to "Meyer-Landrut", reindexed and executed search again with the result:

<lst name="debug">
  <str name="rawquerystring">Meyer</str>
  <str name="querystring">Meyer</str>
  <str name="parsedquery">(+DisjunctionMaxQuery((content:meyer | title:meyer | keywords:meyer | h1:meyer | description:meyer | browsertitle:meyer^3)))/no_coord</str>
  <str name="parsedquery_toString">+(content:meyer | title:meyer | keywords:meyer | h1:meyer | description:meyer | browsertitle:meyer^3)</str>
  <lst name="explain">
    <str name="ID3">
2.5594494 = max of:
  0.5276203 = weight(content:meyer in 1767) [DefaultSimilarity], result of:
    0.5276203 = score(doc=1767,freq=5.0), product of:
      0.32961872 = queryWeight, product of:
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.057556875 = queryNorm
      1.600699 = fieldWeight in 1767, product of:
        2.236068 = tf(freq=5.0), with freq of:
          5.0 = termFreq=5.0
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.125 = fieldNorm(doc=1767)
  1.0237797 = weight(title:meyer in 1767) [DefaultSimilarity], result of:
    1.0237797 = score(doc=1767,freq=2.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      3.0713391 = fieldWeight in 1767, product of:
        1.4142135 = tf(freq=2.0), with freq of:
          2.0 = termFreq=2.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.375 = fieldNorm(doc=1767)
  1.1944097 = weight(description:meyer in 1767) [DefaultSimilarity], result of:
    1.1944097 = score(doc=1767,freq=2.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      3.583229 = fieldWeight in 1767, product of:
        1.4142135 = tf(freq=2.0), with freq of:
          2.0 = termFreq=2.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.4375 = fieldNorm(doc=1767)
  2.5594494 = weight(browserTitle:meyer^3.0 in 1767) [DefaultSimilarity], result of:
    2.5594494 = fieldWeight in 1767, product of:
      1.4142135 = tf(freq=2.0), with freq of:
        2.0 = termFreq=2.0
      5.7913733 = idf(docFreq=14, maxDocs=1807)
      0.3125 = fieldNorm(doc=1767)
</str>
    <str name="ID4">
2.1717649 = max of:
  0.40869296 = weight(content:meyer in 286) [DefaultSimilarity], result of:
    0.40869296 = score(doc=286,freq=3.0), product of:
      0.32961872 = queryWeight, product of:
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.057556875 = queryNorm
      1.239896 = fieldWeight in 286, product of:
        1.7320508 = tf(freq=3.0), with freq of:
          3.0 = termFreq=3.0
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.125 = fieldNorm(doc=286)
  0.9652289 = weight(title:meyer in 286) [DefaultSimilarity], result of:
    0.9652289 = score(doc=286,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 286, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=286)
  0.9652289 = weight(description:meyer in 286) [DefaultSimilarity], result of:
    0.9652289 = score(doc=286,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 286, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=286)
  2.1717649 = weight(browserTitle:meyer^3.0 in 286) [DefaultSimilarity], result of:
    2.1717649 = fieldWeight in 286, product of:
      1.0 = tf(freq=1.0), with freq of:
        1.0 = termFreq=1.0
      5.7913733 = idf(docFreq=14, maxDocs=1807)
      0.375 = fieldNorm(doc=286)
</str>
    <str name="ID5">
2.1717649 = max of:
  0.40869296 = weight(content:meyer in 436) [DefaultSimilarity], result of:
    0.40869296 = score(doc=436,freq=3.0), product of:
      0.32961872 = queryWeight, product of:
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.057556875 = queryNorm
      1.239896 = fieldWeight in 436, product of:
        1.7320508 = tf(freq=3.0), with freq of:
          3.0 = termFreq=3.0
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.125 = fieldNorm(doc=436)
  0.9652289 = weight(title:meyer in 436) [DefaultSimilarity], result of:
    0.9652289 = score(doc=436,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 436, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=436)
  0.9652289 = weight(description:meyer in 436) [DefaultSimilarity], result of:
    0.9652289 = score(doc=436,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 436, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=436)
  2.1717649 = weight(browserTitle:meyer^3.0 in 436) [DefaultSimilarity], result of:
    2.1717649 = fieldWeight in 436, product of:
      1.0 = tf(freq=1.0), with freq of:
        1.0 = termFreq=1.0
      5.7913733 = idf(docFreq=14, maxDocs=1807)
      0.375 = fieldNorm(doc=436)
</str>

...

    <str name="ID1">
2.1717649 = max of:
  0.471918 = weight(content:meyer in 1174) [DefaultSimilarity], result of:
    0.471918 = score(doc=1174,freq=4.0), product of:
      0.32961872 = queryWeight, product of:
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.057556875 = queryNorm
      1.4317087 = fieldWeight in 1174, product of:
        2.0 = tf(freq=4.0), with freq of:
          4.0 = termFreq=4.0
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.125 = fieldNorm(doc=1174)
  0.9652289 = weight(title:meyer in 1174) [DefaultSimilarity], result of:
    0.9652289 = score(doc=1174,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 1174, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=1174)
  0.9652289 = weight(description:meyer in 1174) [DefaultSimilarity], result of:
    0.9652289 = score(doc=1174,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 1174, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=1174)
  2.1717649 = weight(browserTitle:meyer^3.0 in 1174) [DefaultSimilarity], result of:
    2.1717649 = fieldWeight in 1174, product of:
      1.0 = tf(freq=1.0), with freq of:
        1.0 = termFreq=1.0
      5.7913733 = idf(docFreq=14, maxDocs=1807)
      0.375 = fieldNorm(doc=1174)
</str>
    <str name="ID2">
2.1717649 = max of:
  0.471918 = weight(content:meyer in 1766) [DefaultSimilarity], result of:
    0.471918 = score(doc=1766,freq=4.0), product of:
      0.32961872 = queryWeight, product of:
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.057556875 = queryNorm
      1.4317087 = fieldWeight in 1766, product of:
        2.0 = tf(freq=4.0), with freq of:
          4.0 = termFreq=4.0
        5.726835 = idf(docFreq=15, maxDocs=1807)
        0.125 = fieldNorm(doc=1766)
  0.9652289 = weight(title:meyer in 1766) [DefaultSimilarity], result of:
    0.9652289 = score(doc=1766,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 1766, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=1766)
  0.9652289 = weight(description:meyer in 1766) [DefaultSimilarity], result of:
    0.9652289 = score(doc=1766,freq=1.0), product of:
      0.33333334 = queryWeight, product of:
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.057556875 = queryNorm
      2.8956866 = fieldWeight in 1766, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        5.7913733 = idf(docFreq=14, maxDocs=1807)
        0.5 = fieldNorm(doc=1766)
  2.1717649 = weight(browserTitle:meyer^3.0 in 1766) [DefaultSimilarity], result of:
    2.1717649 = fieldWeight in 1766, product of:
      1.0 = tf(freq=1.0), with freq of:
        1.0 = termFreq=1.0
      5.7913733 = idf(docFreq=14, maxDocs=1807)
      0.375 = fieldNorm(doc=1766)
</str>
</lst>

After changing the word, suddenly the results that were before on place 1 and 2, appear now at the end of the result list. Seems like the queue changed and the are now at the end of the same line comparing to the first result. How is that possible? And how I make those results more random, so that the new hyphenated word won't appear at the top of the list, but like in the first search on the third place?

  • did you enable the "debug" feature and check how the score is calculated? An updated document would rank higher because is more recent. if you want more help please post example of: query, document retrieved, document expected and related scores. – AR1 Aug 08 '16 at 09:59
  • When you change the actual token of that single document, the score will change - as the token hit is now compared to a different token in the whole index. If this is the only non-hyphenated version of that token, it'll be seen as very important compared to the other words. Use `debugQuery` to see how the score is calculated. (Solr will not score 'more recent documents' higher by default) – MatsLindh Aug 08 '16 at 10:19
  • I updated my question with debugged output. I can understand, that the new hyphenated word looks now different to solr, but in fact to a normal user it doesn't so much sense when I always see the same pattern and doesn't matter what I search, hyphenated words apper always at the top place (because they are usually rare compared to other non-hyphenated words). Is there any possibility to randomise this procedure and not let solr set a higher score to rare words? – middleendian Aug 22 '16 at 10:34

0 Answers0