1

Hy,

I've noticed some differences when querying Solr with Java and PHP. The query looks like this one here:

text:(www)+timestamp:[2012-04-16T00:00:00Z TO 2012-04-20T23:59:00Z]&q.op=AND&rows=0&sort=timestamp%20desc&facet=true&facet.field=terms_nouns_lemma&facet.limit=20&facet.method=enum

when printing out the number of documents found in Java

response.getResults().getNumFound()

I get almost 80.000, and the same in PHP

$response->response->numFound

returns around 7000. the PHP result seems to be more accurate as only a time frame needs to be considered (and due to the nature of the documents stored). But, when I go to the admin page and insert my query I again get around 80.000 (it's the same value actually as with Java).

What am I missing here?

To me it seems that Java doesn't consider the time frame at all? Maybe worth mentioning is that I'm using Solr 3.5 (and the Java library SolrJ is the corresponding version)

Note I think this question is pretty much related, but it didn't answer the question I have as it doesn't take restrictions into considerations (as the time frame in the query above).

Additionally in PHP, if I don't set the number of rows I want to have in my response, it actually returns the correct amount of documents that were found, is there any equivalent in Java w/ SolrJ (per default, if row isn't set, it will be set to 10, setting it to -1 isn't working either)

Thanks for any hints

Update

as posted in the comments below the difference in the query is that SolrJ replaces a blank/space with a "+", I tried escaping it hardcoded and with the use of ClientUtils.escapeQueryChars(String), but both didn't work as expected

What's really funny as well:

text:(www)&facet.range=timestamp&f.timestamp.facet.range.end=2012-04-16T21:59:59.000Z&f.timestamp.facet.range.gap=+1MINUTE&rows=0

returns the same number of documents as

text:(www)
Community
  • 1
  • 1
divadpoc
  • 903
  • 10
  • 31

1 Answers1

1

Have you validated that the query being executed against the solr index is the same for both the SolrJ and PHP queries? Especially considering that you are saying the SolrJ query is not limiting by the date range you have specified. That would make me suspicious that something is not being setup/passed correctly from SolrJ.

Also with regard to returning all the rows, you can set the rows within SolrJ to an absurdly large number (around 100,000) should work in this case for you, based on your counts.

Paige Cook
  • 22,415
  • 3
  • 57
  • 68
  • thanks, I just realized that with SolrJ a + is inserted instead of a blank, and as blanks are part of the time frame I'm not sure how to escape it (using "\\" didn't work). making use of ClientUtils.escapeQueryChars(String) doesn't work either. any ideas? – divadpoc May 29 '12 at 21:24
  • In your updated section above, the two queries will always return the same thing b/c your query text:(www) is the same in both, the only difference is that you are faceting on the timestamp field in the first query. You need to add the timestamp to the query as you did originally or filter using the parameter `fq` if you want to limit the query to only those items matching the date range. Faceting does not limit the query results it only groups the results found. – Paige Cook May 30 '12 at 11:22
  • ok, thanks. I was/am really confused b/c nothing leads to the supposed output, as the query including the dates is "altered" by SolrJ and escaping does not help, thus I guess the time range is just ignored. – divadpoc May 30 '12 at 17:25
  • accepted the answer as it works with the large # for rows. to the other problem: `text:(www)+timestamp:[2012-04-16T00:00:00Z TO 2012-04-20T23:59:00Z]&q.op=AND` the _q.op_ is ignored, i had to explicitly write _AND_ between _text_ and _timestamp_ . the **+** showing up in the log of Solr when passing a query via SolrJ is not meaning anything. At least it doesn't influence my results, get it right now (as confirmed w/ PHP). thx for ur support. – divadpoc May 30 '12 at 17:58