1

I have an app which has to extract color info from a video and it does this by analyzing each frame. First I extract the frames and then load an array of their locations in memory. As you might imagine, for even a small video it can be in the thousands.

The function I use to extract each frames color info is a promise so I opted to batch an array of promises with Promise.all

With each files absolute path, I read the file with fs and then pass it along to be processed. I've done this with many stand alone images and know the process only takes about a second but suddenly it was taking almost 20min to process 1 image. I finally figured out that using promisify on fs.readFile was what caused the bottle neck. What I don't understand is why?

In the first one fs.readFile is transformed inside of the promise that's returned while in the second one fs.readFile is just used as it normally would be and I wait for resolve to be called. I don't mind using the non-promise one, I'm just curious why this would cause such a slow down?

The second I stopped using promisify the app sped back up to 1 frame / second

The slow code:

   async analyzeVideo(){
    await this._saveVideo();
    await this._extractFrames();
    await this._removeVideo();

    const colorPromises = this.frameExtractor.frames.map(file => {
      return new Promise(resolve => {
        //transform image into data
        const readFile = promisify(fs.readFile);
        readFile(file)
            .then(data => {
              const analyzer = new ColorAnalyzer(data);
              analyzer.init()
                  .then(colors => {
                    resolve(colors)
                  })
            })
            .catch((e)=> console.log(e));
      })
    });
    const colors = await runAllQueries(colorPromises);

    await this._removeFrames();

    this.colors = colors;

    async function runAllQueries(promises) {
      const batches = _.chunk(promises, 50);
      const results = [];
      while (batches.length) {
        const batch = batches.shift();
        const result = await Promise.all(batch)
            .catch(e=>console.log(e));
        results.push(result)
      }
      return _.flatten(results);
    }
  }

The fast code:

async analyzeVideo(){
    await this._saveVideo();
    await this._extractFrames();
    await this._removeVideo();
    const colorPromises = this.frameExtractor.frames.map(file => {
      return new Promise(resolve => {
        //transform image into data
        fs.readFile(file, (err, data) => {
          const analyzer = new ColorAnalyzer(data);
          analyzer.init()
              .then(colors => {
                resolve(colors)
              })
        });
      })
    });
    const colors = await runAllQueries(colorPromises);

    await this._removeFrames();

    this.colors = colors;

    async function runAllQueries(promises) {
      const batches = _.chunk(promises, 50);
      const results = [];
      while (batches.length) {
        const batch = batches.shift();
        const result = await Promise.all(batch)
            .catch(e=>console.log(e));
        results.push(result)
      }
      return _.flatten(results);
    }
  }
  • I doubt it's causing the issue, but you don't need to `promisify` `fs.readFile` N times, do it once at the beginning of the module. – Marcos Casagrande Dec 10 '19 at 19:48
  • @MarcosCasagrande that was actually how I figured out promisify was the issue. when I included it just the once outside the `new Promise()` it was undefined when the promise was actually called –  Dec 10 '19 at 19:51
  • Your `runAllQueries` does not makes sense. By the time you call that function, all the `readFile` calls already are running, you already created the promises. – Bergi Dec 10 '19 at 20:13
  • @Bergi `colorPromises` holds the promises, it doesn't execute them. –  Dec 10 '19 at 20:31
  • @KAT A promise is nothing that can be "executed". It represents the result of an already-started computation, and can be awaited, nothing more. `Promise.all` does not "execute" anything, it just creates a new promise that waits for all the individual ones. It's the `new Promise` call that executes the executor callback, immediately. – Bergi Dec 10 '19 at 21:24
  • @Bergi thank you, I guess I have some learning to do when it comes to promises. It's weird that although it starts right away it can somehow still be attached to promise.all without being started again. –  Dec 10 '19 at 21:40
  • @KAT A promise stores (caches) the result once it has been resolved, so that you can attach arbitrarily many `.then()` handlers at arbitrary times - and get the same value nonetheless – Bergi Dec 10 '19 at 21:41

1 Answers1

1

You don't need to promisify in each loop iteration, just do it once at the top of the module.

Most likely the issue is caused by Promises that are never settled. You are not handling the error correctly, so Promise.all may never finish if an error is thrown.

Instead of logging the error in .catch, you'll have to reject too, or resolve at least if you don't care about the errors. Also analyzer.init() errors are not being catched (if that function can reject)

const readFile = promisify(fs.readFile);
// ...

const colorPromises = this.frameExtractor.frames.map(file => {
   return new Promise((resolve, reject) => {
        //transform image into data
        // const readFile = promisify(fs.readFile);
        readFile(file)
            .then(data => {
              const analyzer = new ColorAnalyzer(data);
              return analyzer.init()   
            })
            .then(resolve) // colors
            .catch((e)=> { 
              reject(e);
              console.log(e)
            });
      })
})

Aside from that runAllQueries is not doing what you think it's doing. You already executed all the promises.

I recommend you use p-limit instead

const pLimit = require('p-limit');

const limit = pLimit(50);

/* ... */

const colorPromises = this.frameExtractor.frames.map(file => {
   return limit(() => {
        return readFile(file)
            .then(data => {
              const analyzer = new ColorAnalyzer(data);
              return analyzer.init()   
            })
            .then(resolve) // colors
    })
})

const colors = await Promise.all(colorPromises);

Furthermore, if you're executing 50 reads at a time, you should increase the value of UV_THREADPOOL_SIZE which defaults to 4.

At your entry point, before any require:

process.env.UV_THREADPOOL_SIZE = 64 // up to 128

Or call the script as: UV_THREADPOOL_SIZE=64 node index.js

Marcos Casagrande
  • 37,983
  • 8
  • 84
  • 98
  • I doubt this is it as when stepping through to try to debug the functions were being properly called, not throwing errors. The loop was just running ridiculously slow. –  Dec 10 '19 at 19:57
  • Do you have a snippet were you can reproduce the issue, that we can run? Is it fixed my moving `promisify` at the top? – Marcos Casagrande Dec 10 '19 at 19:58
  • moving promisify to the top and only calling once resulted in readFile not being in scope when called. Iterating over appx 3000 files at 50 files per promise batch. –  Dec 10 '19 at 20:11
  • Your code is not doing that. I'll update my answer when I have some time. But you're running everything concurrently, the 3k files. – Marcos Casagrande Dec 10 '19 at 20:34
  • I'm curious how `runAllQueries` is incorrect? Is declaring a promise the same as executing it? –  Dec 10 '19 at 21:20
  • When you do: `new Promise(resolve => setTimeout(resolve, 500))`, `setTimeout` call is fired right away. So in your call, you're immediately calling `readFile` – Marcos Casagrande Dec 10 '19 at 21:27
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/204007/discussion-between-kat-and-marcos-casagrande). –  Dec 10 '19 at 21:32