18

I have a requirement where I need to fetch the entire data collection Users from RavenDB and compare the retrieved result set with another set of data. There are close to 4000 records in this particular collection.

Because Raven is Safe By Default, I keep getting an exception for either Number of requests per session exceeded or it returns the maximum 128 records.

I don't want to set the property Session.Advanced.MaxNumberOfRequestsPerSession to a higher value.

What query should I use to get the count of all the records? What is the ideal approach to handle this situation?

Jon Adams
  • 24,464
  • 18
  • 82
  • 120
annantDev
  • 367
  • 4
  • 20
  • In case anyone only goes as far as the first answer the one by Jon Adams is a good way. Here is a link to an article introducing the "new" feature. https://ayende.com/blog/161249/ravendbs-querying-streaming-unbounded-results – Devon Burriss May 31 '18 at 09:22

6 Answers6

21

You use paging, and read this 1024 items at a time.

int start = 0;
while(true)
{
   var current = session.Query<User>().Take(1024).Skip(start).ToList();
   if(current.Count == 0)
          break;

   start+= current.Count;
   allUsers.AddRange(current);

}
Ayende Rahien
  • 22,925
  • 1
  • 36
  • 41
  • 2
    I see that above solution works because the total no. of records is close to 4000, so no of queries would be < 30. Just curious, what would be the way to handle a similar scenario wherein Total no of records are > 30*1024, i.e. if they are more than say 31k count? – annantDev Jul 02 '12 at 16:24
  • 1
    @annantDev You could track the number of requests made within the session. Once it reaches 30, dispose old session, create new session and continue reading. – Arnold Zokas Jul 11 '12 at 12:59
14

This question was posted before this feature was available in RavenDB, but in case anyone else stumbles upon this now...

The encouraged way to do this is via the Streaming API. The RavenDB client batches the stream so it can automatically 'page' the requests/responses to/from the server. If you opt in to using the Streaming API, the client assumes you "know what you're doing" and does not check the 128/1024/30 limits that are used for regular queries.

var query = session.Query<User>();
 
using (var enumerator = session.Advanced.Stream(query)) {
    while (enumerator.MoveNext()) {
        allUsers.Add(enumerator.Current.Document);
    }
}

var count = allUsers.Count;

Tip: Though this is the encouraged way to solve the problem... As a general rule it is best to avoid the situation to start with. What if there are a million records? That allUsers list is going to get huge. Maybe an index or transform could be done first to filter out what data you actually need to display to the user/process? Is this for reporting purposes? Maybe RavenDB should be automatically exporting to a SQL server with reporting services on it? Etc...

Jon Adams
  • 24,464
  • 18
  • 82
  • 120
2

I like Al Dass solution of getting ids to operate on instead of complete large objects. Also getting the ids directly from the index. However the recursion scares me a bit (even though I think it might be ok) and I removed the reflection.

public List<string> GetAllIds<T>()
{
var allIds = new List<string>();
IDocumentSession session = null;

try
{
    session = documentStore.OpenSession();
    int queryCount = 0;
    int start = 0;
    while (true)
    {
        var current = session.Advanced.DocumentQuery<T>()
            .Take(1024)
            .Skip(start)
            .SelectFields<string>("__document_id")
            .AddOrder("__document_id")
            .ToList();

        if (current.Count == 0)
            break;
        allIds.AddRange(current);

        queryCount += 1;
        start += current.Count;

        if (queryCount == 30)
        {
            queryCount = 0;
            session.Dispose();
            session = documentStore.OpenSession();
        }
    }
}
finally
{
    if (session != null)
    {
        session.Dispose();
    }
}

return allIds;
}

also, this is updated to ravendb 3

ZNS
  • 840
  • 1
  • 9
  • 14
1

Building up on the Ayende answer, here is a complete method, that does overcome the problem of 30 queries per session and indeed return all documents of the supplied class:

    public static List<T> getAll<T>(DocumentStore docDB) {
        return getAllFrom(0, new List<T>(), docDB);
    }

    public static List<T> getAllFrom<T>(int startFrom, List<T> list, DocumentStore docDB ) {
        var allUsers = list;

        using (var session = docDB.OpenSession())
        {
            int queryCount = 0;
            int start = startFrom;
            while (true)
            {
                var current = session.Query<T>().Take(1024).Skip(start).ToList();
                queryCount += 1;
                if (current.Count == 0)
                    break;

                start += current.Count;
                allUsers.AddRange(current);

                if (queryCount >= 30)
                {
                    return getAllFrom(start, allUsers, docDB);
                }
            }
        }
        return allUsers;
    }

I hope it is not too hacky to do it like this.

Capaj
  • 4,024
  • 2
  • 43
  • 56
  • 1
    The answer by Renato Kovarish has several advantages over this one. This answer uses recurssion. – BrokeMyLegBiking Sep 01 '14 at 18:36
  • @BrokeMyLegBiking True, the recursion here is unnecessary. But thies one does at least handle the 30 request per session limit that the other referenced answer does not. – Jon Adams Mar 01 '17 at 16:46
1

I honestly prefer the following function:

    public IEnumerable<T> GetAll<T>()
    {
        List<T> list = new List<T>();

        RavenQueryStatistics statistics = new RavenQueryStatistics();

        list.AddRange(_session.Query<T>().Statistics(out statistics));
        if (statistics.TotalResults > 128)
        {
            int toTake = statistics.TotalResults - 128;
            int taken = 128;
            while (toTake > 0)
            {
                list.AddRange(_session.Query<T>().Skip(taken).Take(toTake > 1024 ? 1024 : toTake));
                toTake -= 1024;
                taken += 1024;
            }
        }

        return list;
    }

[]'s

  • You will hit the 30 request per session limit pretty quick with this code, as written anyway, without expanding the request limit. – Jon Adams Mar 01 '17 at 16:44
1

With a slight twist on @capaj's post. Here is a generic way of getting all the document IDs as a list of strings. Note the use of Advanced.LuceneQuery<T>(idPropertyName), SelectFields<T>(idPropertyName) and GetProperty(idPropertyName) to make things generic. The default assumes "Id" is a valid property on the given <T> (which should be the case 99.999% of the time). In the event you have some other property as your Id you can pass it in as well.

public static List<string> getAllIds<T>(DocumentStore docDB, string idPropertyName = "Id") {
   return getAllIdsFrom<T>(0, new List<string>(), docDB, idPropertyName);
}

public static List<string> getAllIdsFrom<T>(int startFrom, List<string> list, DocumentStore docDB, string idPropertyName ) {
    var allUsers = list;

    using (var session = docDB.OpenSession())
    {
        int queryCount = 0;
        int start = startFrom;
        while (true)
        {
            var current = session.Advanced.LuceneQuery<T>().Take(1024).Skip(start).SelectFields<T>(idPropertyName).ToList();
            queryCount += 1;
            if (current.Count == 0)
                break;

            start += current.Count;
            allUsers.AddRange(current.Select(t => (t.GetType().GetProperty(idPropertyName).GetValue(t, null)).ToString()));

            if (queryCount >= 28)
            {
                return getAllIdsFrom<T>(start, allUsers, docDB, idPropertyName);
            }
        }
    }
    return allUsers;
}

An example of where/how I use this is when making a PatchRequest in RavenDb using the BulkInsert session. In some cases I may have hundreds of thousands of documents and can't afford to load all the documents in memory just to re-iterate over them again for the patch operation... thus the loading of only their string IDs to pass into the Patch command.

void PatchRavenDocs()
{
    var store = new DocumentStore
    {
        Url = "http://localhost:8080",
        DefaultDatabase = "SoMeDaTaBaSeNaMe"
    };

    store.Initialize();

    // >>>here is where I get all the doc IDs for a given type<<<
    var allIds = getAllIds<SoMeDoCuMeNtTyPe>(store);    

    // create a new patch to ADD a new int property to my documents
    var patches = new[]{ new PatchRequest { Type = PatchCommandType.Set, Name = "SoMeNeWPrOpeRtY" ,Value = 0 }};

    using (var s = store.BulkInsert()){
        int cntr = 0;
        Console.WriteLine("ID Count " + allIds.Count);
        foreach(string id in allIds)
        {
            // apply the patch to my document
            s.DatabaseCommands.Patch(id, patches);

            // spit out a record every 2048 rows as a basic sanity check
            if ((cntr++ % 2048) == 0)
                Console.WriteLine(cntr + " " + id);
        }
    }
}

Hope it helps. :)

Community
  • 1
  • 1
Al Dass
  • 831
  • 15
  • 23