3

In my Lucene.Net index, I have documents with a startDate field and an endDate field. Both fields store dates in yyyyMMdd format. How can I build a query that will return hits if today's date falls between those two dates?

startDateFieldValue < myTargetDate < endDateFieldValue

For example, if myTargetDate is 17760604, I'd want to get a document back that had a startDate field value of 10660101 and an endDate field value of 19990101.

The scenario is that I have a Lucene database with Lucene documents that represent particular building sites. Each site has a StartConstruction date and an EndConstruction date. My users will enter a specific date, and I want to find all properties that were currently under construction on that date.

Note: I'm working with Lucene.Net 1.9, a much older version, and my company can't upgrade (yet).

dthrasher
  • 40,656
  • 34
  • 113
  • 139

3 Answers3

6

You can do this using a Range Query. Specifically, you can do this using a NumericRangeQuery. To do this begin by indexing your dates using a NumericField and adding them to your document like:

var df = new NumericField(Fields.AmendedDate);
df.SetIntValue(int.Parse(itemToIndex.startDate.ToString("yyyyMMdd")));
doc.Add(df);

You can make your indexing a little faster by reusing your NumericField across many documents see the documentation. With your dates all nicely indexed you are now ready to search across it. To do this we use a NumericRangeQuery:

var q = NumericRangeQuery.NewIntRange(  Fields.AmendedDate,
                                        int.Parse(SearchFrom.ToString("yyyyMMdd")),
                                        int.Parse(SearchTo.ToString("yyyyMMdd")),
                                        true, true);

This query can then be used to search or conjoined to an existing query like:

masterQuery.Add(q, BooleanClause.Occur.MUST);

Splitting your search in this way is a far faster proposition than using a textual term search due to the nature of how numeric fields are indexed. Also, your resolution (in this instance to day level) can be altered to give a better spread across your data (i.e. if you need to the hour, minute or second then add them to the string from most to least significant). The final point of this is that by using a query you ignore the filtering step of your search (it's a normal query, not a filter).

Wolfwyrd
  • 15,716
  • 5
  • 47
  • 67
  • I've mirrored this over at my blog - http://leapinggorilla.com/Blog/Read/3/date-range-searches-in-lucene – Wolfwyrd Dec 27 '12 at 12:40
  • Good tip about using a numeric field. I can see how that might speed things up considerably. But to your last point, is there any particular reason to avoid a filter? – dthrasher Dec 27 '12 at 22:09
  • The reason is because filters (in general) are slower than Queries. There's a nice post here: http://stackoverflow.com/questions/6462350/is-filtering-faster-than-querying-in-lucene with a comment from one of the Lucene maintainers explaining why. – Wolfwyrd Dec 28 '12 at 09:42
1

I'm not sure I phrased my question properly. I want to find out if a particular item was active between a start and an end date. The StartDate is stored in one Lucene field, the EndDate in another.

Here's the search snippet I used:

var searchableDate = DateTools.DateToString(dateToSearchFor, DateTools.Resolution.DAY);

var lowerRange = new RangeQuery(null, new Term("StartDate", searchableDate), true);
var upperRange = new RangeQuery(new Term("EndDate", searchableDate), null, true);

var activeTodayFilter = new BooleanQuery();
activeTodayFilter.Add(new BooleanClause(lowerRange, BooleanClause.Occur.MUST));
activeTodayFilter.Add(new BooleanClause(upperRange, BooleanClause.Occur.MUST));
return activeTodayFilter;

I found the solution in an old Lucene forum/newsgroup, but I'm afraid I don't remember the link.

If there's an easier/better way to write the query above, let me know.

dthrasher
  • 40,656
  • 34
  • 113
  • 139
  • thanks a lot dthrasher.. I have been trying to find a way for same requirement for last 1 week.. this made my day :) – Suhani Mody Apr 28 '15 at 06:18
0

You have to use a RangeQuery.

RangeQuery rq = new RangeQuery(new Term("date", "10660101"),new Term("date", "19990101") ,true);

In an up-to-date version you could use NumericFields/NumericRangeQuery for better performance.

Jf Beaulac
  • 5,206
  • 1
  • 25
  • 46
  • That will work if I'm searching for a range of dates within a single field. But I need to search for a single date that falls between a start and an end field. (In other words, your example is doing the reverse of what I need.) – dthrasher Mar 22 '12 at 07:03
  • 1
    RangeQuery said would still work, you could use your query date +/- some step. – Mikos Mar 22 '12 at 07:58