0

Simple and straight forward question: Is it possible to sort search results according to their creation sequence/date?Adding a date field and sorting by it would be an option. However, the index already exists and contains a huge set of documents which I would like to sort (some).

pelican_george
  • 961
  • 2
  • 13
  • 33
  • AFAIK Lucene doesn't store creation date in the index. But aren't document numbers sequential? (I've just checked, and they [aren't necessarily](https://lucene.apache.org/core/3_0_3/fileformats.html#Document%20Numbers)) – biziclop Mar 02 '16 at 10:49
  • Wasn't aware of that fact. So can we assume that a default resultset is ordered asc by doc id ? – pelican_george Mar 02 '16 at 10:53
  • Ah, as I understand it, these id's are not unique, specially when used in multiple indexes. But are most likely to ensure a consistent creation sequence. I'm going to dig a bit more. – pelican_george Mar 02 '16 at 11:05
  • 1
    Yes, it isn't perfect by any stretch of imagination. Obviously the exact solution would be to timestamp each document, but for a "close enough" solution, something depending on document ids may be possible. But you need to figure out what happens to doc ids during segment merging and other maintenance tasks. – biziclop Mar 02 '16 at 11:24
  • 1
    From the trenches: Users did not realize how great Lucene was. Once they saw it, new requirements came out of the woodwork every couple weeks for a while, I ended up re-indexing 1.5 Million documents three times. Maybe just adding the indexed date field and reindexing is the best way to handle it - after making extra-super-sure they aren't ready to spring something else on you. You can shield the users by searching on the existing index while you create a new one. – Michael Gorsich Mar 04 '16 at 13:27
  • I considered the same. However there are quite a few thousands of documents which were created with a huge time gap, whereas the ordering has to be precise. But I guess you might as well draw the line somewhere and start all from scratch. Oh my, decisions, decisions.. – pelican_george Mar 04 '16 at 13:36

1 Answers1

2

Do not use docID for anything (other than getting the doc after a query). It is not sequential and will not necessarily be stable if you do updates (which is effectively a delete then add).

Simply add a field that represents the date and sort by that in your query.

Make it a numeric field and format the number like yyyyMMddhhmmss. If you require less precision just drop some the digits from the right. Or if you want better precision just store the ticks value.

Extension methods FTW!

    public static long AsYMDHMS(this DateTime date)
    {
        return
            (date.Year * 10000000000) +
            (date.Month * 100000000) +
            (date.Day * 1000000) +
            (date.Hour * 10000) +
            (date.Minute * 100) +
            date.Second;
    }
AndyPook
  • 2,762
  • 20
  • 23