0

I am trying to query solr index using its API,

http://localhost:8983/solr/documents/select?defType=func&q=termfreq(contents,'hello)&wt=json

I have indexed 3 documents and 2 documents/records have the term "hello" but it returns all the documents.

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"termfreq(contents,'hello')",
      "defType":"func",
      "indent":"on",
      "wt":["json",
        "json"],
      "_":"1538568705504"}},
  "response":{"numFound":3,"start":0,"docs":[
      {*here I have docs*}
  ]
  }

I was expecting only the documents which contain the word hello and its occurrence in those documents.

Am I correct or have I not understood this function properly?

computingfreak
  • 4,939
  • 1
  • 34
  • 51
Root
  • 955
  • 1
  • 16
  • 39

1 Answers1

3

You can't use functions like that in Solr. To retrieve only documents that have the term hello present, and get the count as the score, use:

q=content:hello _val_:"termfreq(contents,'hello')"

The first part of your query limits the result set to the documents that have hello in the content field, while the second part invokes the function query parser through the magic _val_ field. The result of that function is assigned as the score for the document, effectively returning both the documents that match and the count of the given term in those documents.

You should also be able to use termfreq(contents, 'hello') directly in your field list (fl=termfreq(contents,'hello'),score,foo) if you don't want to assign it as the score.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • The `_val_` thing makes sense but what is the purpose of `q` if I am getting both the documents that have the word **hello** or not plus the `termfreq` would only increase the score of that file and show the file with high score first. But my problem is to get that score or count ? Is there any way to get the count ? – Root Oct 04 '18 at 05:51
  • 1
    The score would be the count. When using the `_val_` syntax, the value returned by the function is assigned as the score field, so if you use the `score` value returned for the document, that would be the number of times the term appear. The purpose of the other term in `q` is to limit the set of documents to those that contain `hello` (i.e. where the termfreq would be more than 0). – MatsLindh Oct 04 '18 at 06:26
  • Is there any way to return the `score` return in this query `q=content:hello _val_:"termfreq(contents,'hello')"` ? – Root Oct 04 '18 at 06:29
  • If you want to perform a regularly scored query, but attach the termfreq as a separate field, you can use the second syntax I mentioned - use the function the field list to get it as a field (`fl=termfreq(contents,'hello')`). – MatsLindh Oct 04 '18 at 06:35
  • I am getting nothing with this query `http://localhost:8983/solr/documents/select?(fl=termfreq(contents'hello'),score)&indent=on&wt=json` when I remove the () before the query I just get another `maxScore: 0` in the response. – Root Oct 04 '18 at 06:42
  • 1
    `http://localhost:8983/solr/collection/select?fl=termfreq(contents,hello)&q=*:*` works as it should. You can also use field aliasing to get the value under a more friendly name, `fl=hello_freq:termfreq(contents,hello)&q=*:*`. – MatsLindh Oct 04 '18 at 07:24
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181266/discussion-between-root-and-matslindh). – Root Oct 04 '18 at 07:30