0

The app: I'm building an app that takes screenshots with puppeteer and returns them in a zip file to a react front end.

Relevant technologies: node, express, AdmZip

The issue: I can get the data to the point where it triggers the automatic download, but what gets downloaded does not appear to be a proper zip file as I get the following error when attempting to unzip: 'Unable to expand "screenshot-download.zip".

Extra context: To ensure things were working as expected in the process of actually compressing the screenshots into a zip file, I also implemented the "writeZip" method to create a zip file straight from the server and onto my local file system (bypassing converting to buffer and sending to client). This zip file worked as expected and had all the correct contents. This is leading me to believe that the issue is somewhere in the process of sending to client and converting it to something usable.

App.js code (front end):

fetch(`/dcsgrab?tearsheetUrl=${screenShotData}&imagefilelocation=${imageFileLocationData}`)
      .then((response) => response.json())
      .then((data) => {
        const zipBlob = new Blob(data.zipFile.data);
        const url = window.URL.createObjectURL(zipBlob);
        const zipDownload = document.createElement("a");

        setMessageData(data.message);
        setZipData(data.zipFile);

        zipDownload.href = url;
        zipDownload.download = "screenshot-download.zip";
        document.body.appendChild(zipDownload);
        zipDownload.click();
      });
  };

Console log values from returned data (top) and after it's converted to blob (bottom):

{message: 'Screenshots are done!\nPlease check the root directory you previously designated.', zipFile: {…}}
message: "Screenshots are done!\nPlease check the root directory you previously designated."
zipFile: {type: 'Buffer', data: Array(8207179)}
[[Prototype]]: Object

Blob {size: 21304601, type: ''}
size: 21304601
type: ""
[[Prototype]]: Blob

Server.js code (back end - large chunks of puppeteer code removed to make it easier to read through, if it seems necessary though I will add back in):

    app.get('/dcsgrab', (request, response) => {
        const zip = new AdmZip();
    
        (async () => {
    
          /**
           * Screenshot the creative elements on the current page
           * @return {Promise.<Array>} Promise which resolves with an array of clipping paths
           */
            async function getScreenShots() {
                const rects = await page.$$eval(PREVIEW_SELECTOR, iframes => {
                  return Array.from(iframes, (el) => {
                    const {x, y, width, height} = el.getBoundingClientRect();
    
                    return {
                      left: x,
                      top: y,
                      width,
                      height,
                      id: el.id,
                    };
                  });
                }, PREVIEW_SELECTOR).catch(e => {
                  console.error(e.message);
                });
    
                return Promise.all(rects.map(async (rect) => {
                  return await page.screenshot({
                    clip: {
                      x: rect.left,
                      y: rect.top,
                      width: rect.width,
                      height: rect.height,
                    },
                  }).then((content) => {
                    zip.addFile(`screenshot-${screenshotCounter++}.png`, Buffer.from(content, "utf8"), "entry comment goes here");
                    console.log(`${rect.id} element captured and store in zip`);
                  })
                    .catch((e) => {
                      console.error(e.message);
                    });
                }));
            }

            // Wait a bit then take screenshots
            await new Promise(resolve => setTimeout(resolve, DELAY_FOR_ANIMATION));
            await getScreenShots().catch((e) => console.error(e.message));

            // Continue taking screenshots till there are no pages left
            while (await isNextButtonActive()) {
              await getScreenShots().catch((e) => console.error(e.message));
            }

            await browser.close();
    
            const zipToSend = zip.toBuffer();
    
            response.json({ 
                message: 'Screenshots are done!\nPlease check the root directory you previously designated.',
                zipFile: zipToSend
            });
        })();
    }); 
tganyan
  • 603
  • 3
  • 9
  • 23
  • I don't see `getScreenShots` being called anywhere. – gre_gor Mar 22 '23 at 18:07
  • And what does `zipToSend` get serialized as? An array of numbers? That would expand the data 4 fold, which defeats the purpose of zipping. – gre_gor Mar 22 '23 at 18:52
  • `data.zipFile.data` appears to have ~8MB of data but the blob has ~21MB. – gre_gor Mar 22 '23 at 19:00
  • And I don't see how React is relevant here. – gre_gor Mar 22 '23 at 19:00
  • @gre_gor Thanks for the input! Answering your questions in subsequent comments below. – tganyan Mar 22 '23 at 19:29
  • getScreenShots : this is a function from the puppeteer portion of the code and not directly related to the issue but I was afraid of cutting too much out and hurting the context of the relevant code. Suffice to say that this part of the code is working correctly and it does get called further down, as mentioned in my "Extra context" section. I've added it back to hopefully reduce any confusion from that. – tganyan Mar 22 '23 at 19:56
  • zipToSend: I'm very new to this kind of thing, so this is just trying to follow the instructions on the admzip readme [here](https://www.npmjs.com/package/adm-zip). In [their wiki](https://github.com/cthackers/adm-zip/wiki/ADM-ZIP#buffer-tobufferfunction-onsuccess-function-onfail-function-onitemstart-function-onitemend) it’s stated that it returns the content of the zip file as a Buffer object. – tganyan Mar 22 '23 at 19:56
  • data.zipFile.data: This seems to be an issue, but I think my lack of understanding on why this is might be core to my inability to get this figured out independently. – tganyan Mar 22 '23 at 19:56
  • React: Agreed. I initially included react here because that’s what the front end is built with, but then trimmed the code down to just the fetch call, making it not relevant (if it ever was). I’ve updated the question accordingly (additionally removed references to puppeteer, as that also isn't apparently material to the actual issue being asked about). – tganyan Mar 22 '23 at 19:57
  • You can also remove all the screenshot generation code and just have the code generate the zip file by including a single file from the filesystem. – gre_gor Mar 22 '23 at 21:25

1 Answers1

2

The Blob contructor accepts an array of ArrayBuffers, TypedArrays, DataViews, Blobs or strings. You are providing an array of numbers. In this case the numbers get converted to strings and concatenated together.

Demonstration of the problem:

async function blob2string(blob) {
  return new TextDecoder().decode(await blob.arrayBuffer());
}

(async() => {
  const data = [84, 101, 115, 116]; // bytes for the string "Test"

  const wrong_blob = new Blob(data);
  const correct_blob = new Blob([new Uint8Array(data)]);

  console.log("Wrong:", wrong_blob.size, await blob2string(wrong_blob));
  console.log("Correct:", correct_blob.size, await blob2string(correct_blob));
})();

In your case you need to change

new Blob(data.zipFile.data);

to

new Blob([new Uint8Array(data.zipFile.data)]);

Note that sending a binary file as an array of numbers in JSON will expand the size of the data by 2 to 4 times of the original size.
In the previous example the 4 bytes would be 16 bytes as JSON.

I would recommend that the server just returns the zip data directly, so you can just use the URL directly on the link, instead going through fetch, blob and URL conversion.

gre_gor
  • 6,669
  • 9
  • 47
  • 52
  • This is great, thank you! My day is nearing its end but I'm going to sit with this for a bit and will update hopefully tomorrow. – tganyan Mar 22 '23 at 21:44
  • Following up here to let you know this solution worked, the zip is getting returned and downloaded with all the expected contents. I'm still tooling around with things to maybe change how I'm returning the data (per your last note), but as of now it's working so thank you very much for the help! – tganyan Mar 23 '23 at 16:31