2

One thing I really like about Lucene is the query language where I/an application user) can write dynamic queries. I parse these queries via

QueryParser parser = new QueryParser("", indexWriter.getAnalyzer());
Query query = parser.parse("id:1 OR id:3");

But this does not work for range queries like these one:

Query query = parser.parse("value:[100 TO 202]"); // Returns nothing
Query query = parser.parse("id:1 OR value:167"); // Returns only document with ID 1 and not 1 

On the other hand, via API it works (But I give up the convenient way to just use the query as input):

Query query = LongPoint.newRangeQuery("value", 100L, 202L); // Returns 1, 2 and 3

Is this a bug in query parser or do I miss an important point, like QueryParser takes the lexical and not numerical value? How can I chance this without using the query API but parsing the string?

The question is a follow up to this question that pointed out the problem, but not the reason: Lucene LongPoint Range search doesn't work

Full code:

package acme.prod;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

import java.util.Arrays;
import java.util.List;
import java.util.UUID;

public class LuceneRangeExample {

    public static void main(String[] arguments) throws Exception {
        // Create the index
        Directory searchDirectoryIndex = new RAMDirectory();
        IndexWriter indexWriter = new IndexWriter(searchDirectoryIndex, new IndexWriterConfig(new StandardAnalyzer()));

        // Add several documents that have and ID and a value
        List<Long> values = Arrays.asList(23L, 145L, 167L, 201L, 20100L);
        int counter = 0;
        for (Long value : values) {
            Document document = new Document();
            document.add(new StringField("id", Integer.toString(counter), Field.Store.YES));
            document.add(new LongPoint("value", value));
            document.add(new StoredField("value", Long.toString(value)));
            indexWriter.addDocument(document);
            indexWriter.commit();
            counter++;
        }

        // Create the reader and search for the range 100 to 200
        IndexReader indexReader = DirectoryReader.open(indexWriter);
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        QueryParser parser = new QueryParser("", indexWriter.getAnalyzer());
//        Query query = parser.parse("id:1 OR value:167");
//        Query query = parser.parse("value:[100 TO 202]");
        Query query = LongPoint.newRangeQuery("value", 100L, 202L);
        TopDocs hits = indexSearcher.search(query, 100);
        for (int i = 0; i < hits.scoreDocs.length; i++) {
            int docid = hits.scoreDocs[i].doc;
            Document document = indexSearcher.doc(docid);
            System.out.println("ID: " + document.get("id") + " with range value " + document.get("value"));
        }
    }
}
swaechter
  • 1,357
  • 3
  • 22
  • 46

1 Answers1

5

I think there are a few different things to note here:

1. Using the classic parser

As you show in your question, the classic parser supports range searches, as documented here. But the key point to note in the documentation is:

Sorting is done lexicographically.

That is to say, it uses text-based sorting to determine whether a field's values are within the range or not.

However, your field is a LongPoint field (again, as you show in your code). This field stores your data as an array of longs, as shown in the constructor.

This is not lexicographical data - and even when you only have one value, it's not handled as string data.

I assume that this is why the following queries do not work as expected - but I am not 100% sure of this, because I did not find any documentation confirming this:

Query query = parser.parse("id:1 OR value:167");
Query query = parser.parse("value:[100 TO 202]");

(I am slightly surprised that these queries do not throw errors).

2. Using a LongPoint Query

As you have also shown, you can use one of the specialized LongPoint queries to get the results you expect - in your case, you used LongPoint.newRangeQuery("value", 100L, 202L);.

But as you also note, you lose the benefits of the classic parser syntax.

3. Using the Standard Query Parser

This may be a good approach which allows you to continue using your preferred syntax, while also supporting number-based range searches.

The StandardQueryParser is a newer alternative to the classic parser, but it uses the same syntax as the classic parser by default.

This parser lets you configure a "points config map", which tells the parser which fields to handle as numeric data, for operations such as range searches.

For example:

import org.apache.lucene.queryparser.flexible.standard.StandardQueryParser;
import org.apache.lucene.queryparser.flexible.standard.config.PointsConfig;
import java.text.DecimalFormat;
import java.util.Map;
import java.util.HashMap;

...

StandardQueryParser parser = new StandardQueryParser();
parser.setAnalyzer(indexWriter.getAnalyzer());

// Here I am just using the default decimal format - but you can provide
// a specific format string, as needed:
PointsConfig pointsConfig = new PointsConfig(new DecimalFormat(), Long.class);
Map<String, PointsConfig> pointsConfigMap = new HashMap<>();
pointsConfigMap.put("value", pointsConfig);
parser.setPointsConfigMap(pointsConfigMap);

Query query1 = parser.parse("value:[101 TO 203]", "");

Running your index searcher code with the above query gives the following output:

ID: 1 with range value 145
ID: 2 with range value 167
ID: 3 with range value 201

Note that this correctly excludes the 20100L value (which would be included if the query was using lexical sorting).

I don't know of any way to get the same results using only the classic query parser - but at least this is using the same query syntax that you would prefer to use.

andrewJames
  • 19,570
  • 8
  • 19
  • 51
  • Thanks a lot for the detailed answer, I wasn't aware of the StandardQueryParser (Works great). Somehow I assumed that the search format is defined by the document field (LongPoint) and not the query itself, but that's not correct. In addition the same field name in different documents could be formatted/stored differently. – swaechter Nov 12 '20 at 08:58