2

I have an object stored in RavenDB with three properties: ID, Score, Date.

I want to create an index for retrieving the top 5 scores within a given date range. However, I only want to retrieve one record per ID. If a single ID shows up more than once in the top scores, I only want to retrieve the highest score for that ID, then move on to the next ID.

example scores:

Score____ID____
1000      1
950       1
900       1
850       2
800       2
750       3
700       4
650       5
600       6
550       7

desired query results:

Score____ID____
1000      1
850       2
750       3
700       4
650       5

I have created an explicit index similar to this (adjusted for simplicity):

            Map = docs => from doc in docs
                      orderby doc.Score descending                          
                      select new
                        {
                            Score = doc.Score,
                            ID = doc.ID,
                            Date = doc.Date
                        };

I call my query with code similar to this (adjusted for simplicity):

                    HighScores = RavenSession.Query<Score, Scores_ByDate>()
                                                .Customize(x => x.WaitForNonStaleResultsAsOfNow())
                                                .Where(x => x.Date > StartDate)
                                                .Where(x => x.Date < EndDate)
                                                .OrderByDescending(x => x.Score)
                                                .Take(5)
                                                .ToList();

I don't know how to say "only give me the results from each ID one time in the list."

user1304444
  • 1,713
  • 3
  • 21
  • 38

2 Answers2

3

So a few pointers:

  • Don't order in the Map function. Maps are designed to just dump documents out.
  • Use the Reduce to do grouping, as this is the way they work by design
  • Add a hint to RavenDB that a particular column will be sorted in code, and what type of field it is.

By default, the map/reduce assumes the sorting is for text, even if it is a number - (I learned this the hard way and got help for it.)

So..

Just define the Map/Reduce index as normal, and add a sort condition at the end, like this:

public class Score_TopScoringIndex : AbstractIndexCreationTask<Score, Score>
{
     public Score_TopScoringIndex()
     {
        Map = docs => from doc in docs
                  select new
                       {
                           Score = doc.Score,
                           ID = doc.ID,
                           Date = doc.Date
                       };

        Reduce = results => from result in results
                        group result by result.ID into g
                        select new 
                        {
                           Score = g.First().Score,
                           ID = g.Key,
                           Date = g.First().Date
                        };

        Sort(x=>x.Score, SortOptions.Int);
     }
}

Make sure the index is in the DB by using at the startup of your application:

IndexCreation.CreateIndexes(typeof(Score_TopScoringIndex).Assembly, documentStore);

Now, when you query, the OrderByDescending, it will be very fast.

using(var session = store.OpenSession())
{
   var highScores = session.Query<Score>("Scores/TopScoringIndex")
                       .OrderByDescending(x=>x.Score)
                       .Take(5);
}
Community
  • 1
  • 1
Dominic Zukiewicz
  • 8,258
  • 8
  • 43
  • 61
-3

You can try using morelinq library

https://code.google.com/p/morelinq/

which has a DistintBy extension.

COLD TOLD
  • 13,513
  • 3
  • 35
  • 52
  • 1
    To use this, I would first need to select an unknown, arbitrary number of records from the database. Then I could use morelinq to filter down for the records I actually want. I'd really like a solution that retrieves 5 records from the database. – user1304444 May 29 '14 at 21:15
  • you can try using it before you take 5 I think you still take select an unknown, arbitrary number since you other operation before you do take – COLD TOLD May 29 '14 at 21:19
  • Actually, with RavenDB, even though the index code is written in c#, the actual index is created at the server. When I call `RavenSession.Query<>().Take(5)` that code goes out to the database, and the database knows to only return 5 documents. – user1304444 May 29 '14 at 22:01