1

I am new to RavenDB and could really use some help.

I have a collection of ~20M documents, and I need to add a key to each document. The challenge is that the value of the key needs to be derived from another document.

For instance, given the following document:

{
    "Name" : "001A"
    "Date" : "09-09-2013T00:00:00.0000000"
    "Related" : [
        "002B",
        "003B"
    ]
}

The goal is to add a key that holds the dates for the related documents, i.e. 002B and 003B, by looking up the related documents in the collection and returning their date. E.g.:

{
    "Name" : "001A"
    "Date" : "09-09-2013T00:00:00.0000000"
    "Related" : [
        "002B",
        "003B"
    ]
    "RelatedDates" : [
        "08-10-2013T00:00:00.0000000",
        "08-15-2013T00:00:00.0000000"
    ]
}

I realize that I'm trying to treat the collection somewhat like a relational database, but this is the form that my data is in to begin with. I would prefer not to put everything into a relational dataset first in order to structure the data for RavenDB.

I first tried doing this on the client side, by paging through the collection and updating the records. However, I quickly reach the maximum number of request for the session.

I then tried patching on the server side with JavaScript, but I'm not sure if this is possible.

At this point I would greatly appreciate some strategic guidance on the right way to approach this problem, as well as, more tactical guidance on how to implement it.

Chris
  • 1,313
  • 2
  • 13
  • 26

1 Answers1

1

The recommended way of doing this is via a Console application that loops thru all your records, similar to what you have already done but in a way that pages the data so you dont hit the maximum number of requests per session.

See this example from the ravendb source code example application:

you need to do something like this:

using (var store = new DocumentStore { ConnectionStringName = "RavenDB" }.Initialize())
        {
            int start = 0;
            while (true)
            {
                using (var session = store.OpenSession())
                {
                    var posts = session.Query<Post>()
                        .OrderBy(x => x.CreatedAt)
                        .Include(x => x.CommentsId)
                        .Skip(start)
                        .Take(128)
                        .ToList();

                    if (posts.Count == 0)
                        break;

                    foreach (var post in posts)
                    {
                        session.Load<PostComments>(post.CommentsId).Post = new PostComments.PostReference
                        {
                            Id = post.Id,
                            PublishAt = post.PublishAt
                        };
                    }

                    session.SaveChanges();
                    start += posts.Count;
                    Console.WriteLine("Migrated {0}", start);
                }
            }
        }

I've done this sort of thing with about ~1.5M records and it wasnt exactly quick to do the migration. If your records are small then you can just Load<> and SaveChanges on each one as from experience programmatically patching the documents did not speed things up materially

As a side note, the ravendb google groups is very active if you want to ask specifically about doing this from the studio

wal
  • 17,409
  • 8
  • 74
  • 109
  • Thanks @wal This is super helpful and I think the answer. I'm going to try and cobble something together quick based on this and see if at all works. However, at first blush my problem with exceeding the session request limit is fixed. – Chris Sep 19 '13 at 23:14
  • Well it wasn't fast, but it worked, which is all that mattered in this case. – Chris Sep 24 '13 at 18:25