0

I am currently having a problem with specifying filters for Lucene/Solr. Every solution I come up with breaks other solutions. Let me start with an example. Assume that we have the following 5 documents:

  • doc1 = [type:Car, sold:false, owner:John]
  • doc2 = [type:Bike, productID:1, owner:Brian]
  • doc3 = [type:Car, sold:true, owner:Mike]
  • doc4 = [type:Bike, productID:2, owner:Josh]
  • doc5 = [type:Car, sold:false, owner:John]

So I need to construct the following filter queries:

  1. Give me all documents of type:Car which has sold:false only and if it is a type that is different that Car, include in the result. So basically I want docs 1, 2, 4, 5 the only document I don't want is doc3 because it is has sold:true. To put it more precisely:

    for each document d in solr/lucene
    if d.type == Car {
        if d.sold == false, then add to result
        else ignore
    }
    else {
        add to result
    }
    return result
    
  2. Filter in all documents that are of (type:Car and sold:false) or (type:Bike and productID:1). So for this I will get 1,2,5.

  3. Get all documents that if the type:Car then get only with sold:false, otherwise get me documents from owners John, Brian, Josh. So for this query I should get 1, 2, 4, 5.

Note: You don't know all the types in the documents. Here it is obvious because of the small number of documents.

So my solutions were:

  1. (-type:Car) OR ((type:Car) AND (sold:false). This works fine and as expected.
  2. ((-type:Car) OR ((type:Car) AND (sold:false)) AND ((-type:Bike) OR ((type:Bike) AND (productID:1))). This solution does not work.
  3. ((owner:John) OR (owner:Brian) OR (owner:Josh)) AND ((-type:Car) OR ((type:Car) AND (sold:false)). This does not work, I can make it work if I do I do this: ((owner:John) OR (owner:Brian) OR (owner:Josh)) AND ((version:* OR (-type:Car)) OR ((type:Car) AND (sold:false)). I don't understand how this works, because logically it should work, but Solr/Lucene somehow does something.
Ammar
  • 5,070
  • 8
  • 28
  • 27
  • Can you give an example of the sort of solutions you've tried? I imagine that a simple BooleanQuery with each sub-query as a TermQuery with Occur.MUST would work if wrapped with a query filter wrapper (if that sounded like mumbo-jumbo, let me know and I'll turn it into a full answer). – joshlf Jul 17 '13 at 20:14
  • @joshlf13, please do the honor. – SSaikia_JtheRocker Jul 17 '13 at 20:31
  • @joshlf13 I put my solution, please put on your solution if it still applies. – Ammar Jul 17 '13 at 20:32
  • I fail to understand case 1. Why would you get docs 1, 2, 4, and 5 when you are looking for docs with `type:Car` and `sold:false`. Docs 2 and 4 are `type:Bike` – femtoRgon Jul 17 '13 at 20:41
  • 1
    Ammar, I put mine as an answer per @JtheRocker's request. – joshlf Jul 17 '13 at 20:45
  • @femtoRgon fixed question 1 to specify precisely what I want. – Ammar Jul 17 '13 at 21:30

2 Answers2

1

Okay, to get anything but a sold car, you could use -(type:Car sold:true).

This can be incorporated into the other queries, but you'll need to be careful with lonely negative queries like this. Lucene doesn't handle them well, generally speaking, and Solr has some odd gotchas as well. Particularly, A -B reads more like "get all A but forbid B" rather than "get all A and anything but B". Similar problem with A or -B, see this question for more.

To get around that, you'll need to surround the negative with an extra set of parentheses, to ensure it is understood by Solr to be a standalone negative query, like: (-(type:Car AND sold:true))

So:

  1. -(type:Car AND sold:true) (This doesn't get the result you stated, but as per my comment, I don't really understand your stated results)

  2. (type:Bike AND productID:1) (-(type:Car AND sold:true)) (You actually wrote this in the description of the problem!)

  3. (-(type:Car AND sold:false)) owner:(John Brian Josh)

Community
  • 1
  • 1
femtoRgon
  • 32,893
  • 7
  • 60
  • 87
  • The problem with this answer is that you assume that you know all the types of the docs. If want my answer to know only types mentioned in the question. So for your answers they assume that I know the types. – Ammar Jul 17 '13 at 21:10
  • It does what you describe. The first query listed will allow only those documents which are not sold and are of type Car, as stated. What else would possibly be included. If you want documents of any type that are not sold, there is no benefit to including a type at all. Simply `sold:false` should be adequate (or if your data is very poorly normalized and you are able to use lonely negative queries, you might have to use `-sold:true`) – femtoRgon Jul 17 '13 at 21:17
  • You have provoked a very good argument that I can say -sold:false and get what I wanted. However, I cannot assume that another type (lets say type:Airplane) does not have a sold:false. In this query I want to attach the type of the documents to its specific fields. Interesting point, but still I cannot make that assumption. – Ammar Jul 17 '13 at 21:28
0

My advice is to use programmatic Lucene (that is, directly in Java using the Java Lucene API) rather than issuing text queries which will be interpreted. This will give you much more fine-grained control.

What you're going to want to do is construct a Lucene Filter Object using the QueryWrapperFilter API. A QueryWrapperFilter is a filter which takes a Lucene Query, and filters out any documents which do not match that query.

In order to use QueryWrapperFilter, you'll need to construct a Query which matches the terms you're interested in. The best way to do this is to use TermQuery:

TermQuery tq = new TermQuery(new Term("fieldname", "value"));

As you might have guessed, you'll want to replace "fieldname" with the name of a field, and "value" with a desired value. For example, from your example in the OP, you might want to do something like new Term("type", "Car").

This only matches a single term. You're going to need multiple TermQueries, and a way to combine them to create a single, larger query. The best way to do this is with BooleanQuery:

BooleanQuery bq = new BooleanQuery();
bq.add(tq, BooleanQuery.Occur.MUST);

You can call bq.add as many times as you want - once for each TermQuery that you have. The second argument specifies how strict the query is. It can specify that a sub-query MUST appear, SHOULD appear, or should NOT appear (these are the three values of the BooleanQuery.Occur enum).

After you've added each of the sub-queries, this BooleanQuery represents the full query which will match only the documents you ask for. However, it's still not a filter. We now need to feed it to QueryWrapperFilter, which will give us back a filter object:

QueryWrapperFilter qwf = new QueryWrapperFilter(bq);

That should do it. Then if you want to run queries over only the documents allowed through by that filter, you just take your new query (call it q) and your filter, and create a FilteredQuery:

FilteredQuery fq = new FilteredQuery(q, qwf);
joshlf
  • 21,822
  • 11
  • 69
  • 96
  • How can I use this in the context of Solr? I have a SolrQuery object for the query. Is there a way to convert SolrQuery to Query so that it can be as `q` in `FilteredQuery fq = new FilteredQuery(q, qwf);` – Ammar Jul 17 '13 at 22:04
  • I'm trying to find (and failing, unfortunately) how you use Lucene in Solr. I'd think it would be pretty straightforward, but I guess not. – joshlf Jul 17 '13 at 22:08