0

I am using a document library. It could have folders and files in it. My requirement is to fetch all content from the document library using an API (https://some.domain.com/folder/{id})

My current logic is using STACK

At root folder -

  public async Task<Content> RetrieveDocuments(string url, string id)
    {
        var files = new List<string>();
        var stack = new Stack<string>();
        stack.Push(id);

        while (stack.Count > 0)
        {
            var roundId = stack.Pop();
            var response = Make An API call (https://some.domain.com/folder/{id})
            if (response != null)
            {
                response.Folders?.ForEach(f => stack.Push(f.FolderId));
                response.Files?.ForEach(f => files.Add(resourceRegex.Replace(f.Path, "/")));
            }
        }

        return files;
    }

Now the problem here is - if the there are a lot of files in recursively placed folders.. it becomes very time consuming and this function call often leads to timeout.

can anyone suggest a better way of doing it.

  • Is the API under your control? If it is then you can change it so that your initial call to /folder/{id} returns a list of URLs, each of which points to one of the files in the target folder. You can then call each of those URLs to retrieve each file individually. It means more HTTP requests, but each one is doing a lot less work and returning a lot less data, so is less likely to timeout. – sbridewell Sep 06 '21 at 16:39
  • No I don't have a control over it . Can multiple threads help? – Hershika Sharma Sep 07 '21 at 02:30
  • Presumably it's the "make an API call" line which is timing out? If so, and you don't control that API, then the only things you can really do are to increase the call's timeout period (a bit of a sticky plaster solution) or see if you can work out how to request the contents of the document library in smaller chunks. – sbridewell Sep 07 '21 at 06:36

1 Answers1

0

The problem is hard to solve and it really depends on the structure of folders. The biggest problem is, that you are waiting to perform the next request, after the previous finishes.

The algorithm you're implementing is BFS (but in this case there will be no difference if it's DFS or BFS). You can try to implement it to run in parallel, but it feels really hard.

I think you could try a hybrid solution, where you read the first folder, and then read all first-level folders in parallel. Here is a hint on how you can implement sending requests in parallel: https://www.michalbialecki.com/2018/04/19/how-to-send-many-requests-in-parallel-in-asp-net-core/

Anyway, I don't think there is a good solution to this one when you don't have control over the API. You might just need to extend the timeout.

Good luck! :)

Mik
  • 3,998
  • 2
  • 26
  • 16