1

I have 1000 files of information in MongoDB collection. I am writing a query to fetch 1000 records and in a loop, I am calling a function to download that file to local system. So, it's a sequential process to download all 1000 files.

I want some parallelism in the downloading process. In the loop, I want to download 10 files at a time, meaning I want to call download function 10 times, after completing 10 file downloads I want to download next 10 files (that means I need to call download function 10 times).

How can I achieve this parallelism OR is there any better way to do this?

I saw Kue npm, but how to achieve this? By the way I am downloading from FTP, so I am using basic-ftp npm for ftp operations.

James Z
  • 12,209
  • 10
  • 24
  • 44
Sudhakar Reddy
  • 153
  • 1
  • 11

2 Answers2

0

The async library is very powerful for this, and quite easy too once you understand the basics.

I'd suggest that you use eachLimit so your app won't have to worry about looping through in batches of ten, it will just keep ten files downloading at the same time.

var files = ['a.txt', 'b.txt']
var concurrency = 10;

async.eachLimit(files, concurrency, downloadFile, onFinish);


function downloadFile(file, callback){

  // run your download code here
  // when file has downloaded, call callback(null)
  // if there is an error, call callback('error code')

}

function onFinish(err, results){

  if(err) {
    // do something with the error
  }

  // reaching this point means the files have all downloaded

}

The async library will run downloadFile in parallel, sending each instance an entry from the files list, then when every item in the list has completed it will call onFinish.

Graham
  • 7,431
  • 18
  • 59
  • 84
  • Thanks, @Graham. It's fairly simple. But nothing is returning from `results` in `onFinish` method. I want fileObject from `downloadFile` method. When the download was success I am calling the callback like this `callback(null, file)` – Sudhakar Reddy Dec 15 '19 at 16:38
  • `results` should be an array of files, if the function was called correctly. Without seeing your code it's tough for me to know what's going on. – Graham Dec 15 '19 at 22:20
  • below is my code. `function downloadFile(file, callback){ console.log(file); callback(null, file); } function onFinish(err, results){ if(err) { console.log('error ', err); } console.log('completed ', results); }` `results` always returns undefined. But in `downloadFile` file name fine. – Sudhakar Reddy Dec 16 '19 at 06:02
0

Without seeing your implementation I can only provide a generic answer.

Let's say that your download function receives one fileId and returns a promise that resolves when said file has finished downloading. For this POC, I will mock that up with a promise that will resolve to the file name after 200 to 500 ms.

function download(fileindex) {
  return new Promise((resolve,reject)=>{ 
    setTimeout(()=>{ 
      resolve(`file_${fileindex}`);
    },200+300*Math.random());
  });
}

You have 1000 files and want to download them in 100 iterations of 10 files each.

let's encapsulate stuff. I'll declare a function that receives the starting ID and a size, and returns [N...N+size] ids

function* range(bucket, size=10) {
    let start = bucket*size, 
        end=start+size;
    for (let i = start; i < end; i++) {
        yield i;
    }
}

You should create 100 "buckets" containing a reference to 10 files each.

 let buckets = [...range(0,100)].map(bucket=>{
    return [...range(bucket,10)];
 });

A this point, the contents of buckets are:

 [
   [file0 ... file9]
   ...
   [file 990 ... file 999]
 ]

Then, iterate over your buckets using for..of(which is async-capable)

On each iteration, use Promise.all to enqueue 10 calls to download

async function proceed() {
    for await(let bucket of buckets) { // for...of
       await Promise.all(bucket.reduce((accum,fileindex)=>{
           accum.push(download(fileindex)); 
           return accum; 
       },[]));
    }
}

let's see a running example (just 10 buckets, we're all busy here :D )

function download(fileindex) {
  return new Promise((resolve, reject) => {
    let file = `file_${fileindex}`;
    setTimeout(() => {
      resolve(file);
    }, 200 + 300 * Math.random());
  });
}

function* range(bucket, size = 10) {
  let start = bucket * size,
    end = start + size;
  for (let i = start; i < end; i++) {
    yield i;
  }
}

let buckets = [...range(0, 10)].map(bucket => {
  return [...range(bucket, 10)];
});

async function proceed() {
  let bucketNumber = 0,
    timeStart = performance.now();
  for await (let bucket of buckets) {
    let startingTime = Number((performance.now() - timeStart) / 1000).toFixed(1).substr(-5),
      result = await Promise.all(bucket.reduce((accum, fileindex) => {
        accum.push(download(fileindex));
        return accum;
      }, []));
    console.log(
      `${startingTime}s downloading bucket ${bucketNumber}`
    );
    await result;
    let endingTime = Number((performance.now() - timeStart) / 1000).toFixed(1).substr(-5);

    console.log(
      `${endingTime}s bucket ${bucketNumber++} complete:`,
      `[${result[0]} ... ${result.pop()}]`
    );
  }
}

document.querySelector('#proceed').addEventListener('click',proceed);
<button id="proceed" >Proceed</button>
ffflabs
  • 17,166
  • 5
  • 51
  • 77