1

I have 50,000 documents in my raven database, but when I I run this query the Id of the latestProfile object is returned as 9999 (the first id in the db is 0, so this is the ten thousandth item).

    //find the profile with the highest ID now existing in the collection
    var latestProfile = session.Query<SiteProfile>()
        .Customize(c => c.WaitForNonStaleResults())
        .OrderByDescending(p => p.Id)
        .FirstOrDefault();

    //lastProfile.Id is 9999 here

    //See how many items there are in the collection. This returns 50,000
    var count = session.Query<SiteProfile>()
        .Customize(c => c.WaitForNonStaleResults()).Count();

My guess is that Raven is paging before my OrderByDescending statement, but

  1. The default page size is 10, and even the max is 1024
  2. All the Parts of this are either IRavenQueryable or IQueryable
  3. It is also not a stale index as I have tested this with WaitForNonStaleResults()

My expected result here is the most recent id I added (50,000) to be the item returned here, but yet it is not.

Why not? This looks like a bug in Raven to me.

EDIT:

Ok, so I now know exactly why, but it still looks like a bug. Here is a list of the items from that same list actualised by a ToArray()

{ Id = 9999 },
{ Id = 9998 },
{ Id = 9997 },
{ Id = 9996 },
{ Id = 9995 },
{ Id = 9994 },
{ Id = 9993 },
{ Id = 9992 },
{ Id = 9991 },
{ Id = 9990 },
{ Id = 999 },    //<-- Whoops! This is text order not int order
{ Id = 9989 }, 

So even though my Id column is an integer because Raven stores it internally as a string it is ordering by that representation. Clearly Ravens Queryable implementation is resolving the ordering before checking types

I have read that you can define sort order to use integer sorting on defined indexes but really, this should not matter. In a strongly typed language integers should be sorted as integers.

Is there a way to make this Id ordering correct? Do I have actually have to resort to creating a special index on the id column just to get integers ordered correctly?

UPDATE 2:

I am now using an index as follows:

    public SiteProfiles_ByProfileId()
    {
        Map = profiles => from profile in profiles
                       select new
                       {
                           profile.Id
                       };

        Sort(x => x.Id, SortOptions.Int);
    }

To try and force it to understand integers. I can see that my index is called via the Raven server console as follows:

Request # 249: GET     -     3 ms - Bede.Profiles - 200 -     /indexes/SiteProfiles/ByProfileId?&pageSize=1&sort=-__document_id&operationHeadersHash=-1789353429
    Query:
    Time: 3 ms
    Index: SiteProfiles/ByProfileId
    Results: 1 returned out of 20,000 total.

but still it comes back with string ordered results. I have seen advice not to use integers as the id, but that would cause massive issues on this project as there are 3rd parties referencing the current ids (in the old service this is designed to replace).

UPDATE 3: I have specific unit test that shows the issue. it appears to work fine for any integer property except for the Id.

    [TestMethod]
    public void Test_IndexAllowsCorrectIntSortingWhenNotId()
    {
        using (var store = new EmbeddableDocumentStore() {RunInMemory = true})
        {
            store.Initialize();
            IndexCreation.CreateIndexes(typeof(MyFakeProfiles_ByProfileId).Assembly, store);

            using (var session = store.OpenSession())
            {
                var profiles = new List<MyFakeProfile>()
                {
                    new MyFakeProfile() { Id=80, Age = 80, FirstName = "Grandpa", LastName = "Joe"},
                    new MyFakeProfile() { Id=9, Age = 9,FirstName = "Jonny", LastName = "Boy"},
                    new MyFakeProfile() { Id=22, Age = 22, FirstName = "John", LastName = "Smith"}
                };

                foreach (var myFakeProfile in profiles)
                {
                    session.Store(myFakeProfile, "MyFakeProfiles/" + myFakeProfile.Id);
                }
                session.SaveChanges();

                var oldestPerson = session.Query<MyFakeProfile>().Customize(c => c.WaitForNonStaleResults())
                    .OrderByDescending(p => p.Age).FirstOrDefault();

                var youngestPerson = session.Query<MyFakeProfile>().Customize(c => c.WaitForNonStaleResults())
                    .OrderBy(p => p.Age).FirstOrDefault();

                var highestId = session.Query<MyFakeProfile>("MyFakeProfiles/ByProfileId").Customize(c => c.WaitForNonStaleResults())
                    .OrderByDescending(p => p.Id).FirstOrDefault();

                var lowestId = session.Query<MyFakeProfile>("MyFakeProfiles/ByProfileId").Customize(c => c.WaitForNonStaleResults())
                    .OrderBy(p => p.Id).FirstOrDefault();

                //sanity checks for ordering in Raven
                Assert.AreEqual(80,oldestPerson.Age); //succeeds
                Assert.AreEqual(9, youngestPerson.Age);//succeeds
                Assert.AreEqual(80, highestId.Id);//fails
                Assert.AreEqual(9, lowestId.Id);//fails
            }
        }
    }

    private void PopulateTestValues(IDocumentSession session)
    {
        var profiles = new List<MyFakeProfile>()
        {
            new MyFakeProfile() { Id=80, Age = 80, FirstName = "Grandpa", LastName = "Joe"},
            new MyFakeProfile() { Id=9, Age = 9,FirstName = "Jonny", LastName = "Boy"},
            new MyFakeProfile() { Id=22, Age = 22, FirstName = "John", LastName = "Smith"}
        };

        foreach (var myFakeProfile in profiles)
        {
            session.Store(myFakeProfile, "MyFakeProfiles/" + myFakeProfile.Id);
        }
    }
}

public class MyFakeProfile
{
    public int Id { get; set; }
    public int Age { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

public class MyFakeProfiles_ByProfileId : AbstractIndexCreationTask<MyFakeProfile>
{

    // The index name generated by this is going to be SiteProfiles/ByProfileId
    public MyFakeProfiles_ByProfileId()
    {
        Map = profiles => from profile in profiles
                          select new
                          {
                              profile.Id
                          };

        Sort(x => (int)x.Id, SortOptions.Int);
    }
}
Paul Devenney
  • 889
  • 6
  • 18
  • Are you sure it's not chunked to improve performance? What happens if you Enumerate it and select the last item? Is it already a ordered list then why do you orderbydesc and then take the first one when you could take the Last() one? – Thomas Lindvall May 30 '14 at 12:11
  • What does it look like if you don't OrderByDesc? Is the Id attribute something that just counts up or what is it? Do you have any other attribute that you can sort on that's more reliable, such as creation date/time that you can take instead? – Thomas Lindvall May 30 '14 at 15:06
  • I'm doing several bulk imports over a series of months to bring data across from another system. a 3rd system assigned ranges of ids to these systems, so I can't make many assumptions about any one import (I can't just take the highest id of an import and change my hilo to that...) – Paul Devenney May 30 '14 at 16:10
  • What happens if you Sort(x => (int)x.Id) ? – Thomas Lindvall May 30 '14 at 19:17

2 Answers2

0

You need to specify the type of the field on the index, see http://ravendb.net/docs/2.5/client-api/querying/static-indexes/customizing-results-order

Side note, IDs in RavenDB are always strings. You seem to be trying to use integer IDs - don't do that.

synhershko
  • 4,472
  • 1
  • 30
  • 37
  • There is nothing in the Raven documentation saying do not use integer ids, in fact, they provide the support for it if your Id column is an int. My index is public SiteProfiles_ByProfileId() { Map = profiles => from profile in profiles select new { profile.Id }; Sort(x => x.Id, SortOptions.Int); } And I also see the index used in the console GET - 3 ms - Bede.Profiles - 200 - /indexes/SiteProfiles/... Query: Time: 3 ms Index: SiteProfiles/ByProfileId Results: 1 returned out of 20,000 total. – Paul Devenney May 30 '14 at 15:54
  • What happens if you do session.Query() .Customize(c => c.WaitForNonStaleResults()).Sort(x => x.Id, SortOptions.Int).LastOrDefault(); ? RavenDB uses JSON, there for every object is a string until you've deserialized it, SortOptions seems to tell the function that x.Id is a int implicitly. (Hope it makes sense, just got back from the gym) – Thomas Lindvall May 30 '14 at 16:59
  • Sort does not appear to be an option available on the result of Customize() – Paul Devenney Jun 02 '14 at 14:34
  • Paul, this is just syntactic sugar, the IDs are always stored as strings. Having worked in the RavenDB core team, having written most of the docs you refer to, and a book on RavenDB - I can pretty much guarantee my advise to you is precise and accurate. And Sort is not done in the query part, it's on the index creation part. – synhershko Jun 03 '14 at 12:06
0

You can provide multiple Sort field, as you have only defined it for Id:

public SiteProfiles_ByProfileId()
{
    Map = profiles => from profile in profiles
                   select new
                   {
                       profile.Id
                   };

    Sort(x => x.Id, SortOptions.Int);
    Sort(x => x.Age, SortOptions.Int);
}

BUT ... I am unsure of the effects of applying a sort on a field that isn't mapped.

You may have to extend the mapping to select both fields, like this:

public SiteProfiles_ByProfileId()
{
    Map = profiles => from profile in profiles
                   select new
                   {
                       profile.Id,
                       profile.Age
                   };

    Sort(x => x.Id, SortOptions.Int);
    Sort(x => x.Age, SortOptions.Int);
}
Dominic Zukiewicz
  • 8,258
  • 8
  • 43
  • 61