1

I am looking for a nodeJS boiler plate, if technically possible, to download a given file and upload it simultaneously as it downloads.

It implies that the whole file isn’t downloaded nor stored entirely before being uploaded, so memory usage shouldn’t be relevant here.

I understand that streams may be the solution, but I am unsure what does it imply in terms of source and destination requirements (multipart support for instance) ?

redvivi
  • 83
  • 8
  • It will indeed need streams, don't have the time for a full answer, but what you want to look for is a library like node-fetch that can return the body of the response as a ReadableStream, and can take that as input for another request. As for the multipart support, the documentation should give you more informations on what is automated and what you need to look for – DrakaSAN Apr 19 '23 at 08:46
  • https://www.npmjs.com/package/node-fetch#streams have example of receiving the response as a stream, and https://www.npmjs.com/package/node-fetch#fetchurl-options indicate the `body` can be a ReadableStream – DrakaSAN Apr 19 '23 at 08:52
  • And for the multipart support, a look in the issues of the github repository of node-fetch searching for `multipart stream` raise this one: https://github.com/node-fetch/node-fetch/issues/347 that indicate it is possible and even have an example code – DrakaSAN Apr 19 '23 at 08:56
  • Thanks @DrakanSAN ! Can you elaborate whether multipart stream is a requirement? – redvivi Apr 19 '23 at 14:33
  • Doesn't looks like a requirement, so it will depend on what your file size is and what the server is ready to accept. – DrakaSAN Apr 19 '23 at 15:38
  • The responses are using `request`, that is since deprecated, but you can see a very similar question there: https://stackoverflow.com/questions/22186979/download-file-from-url-and-upload-it-to-aws-s3-without-saving-node-js?rq=2 – DrakaSAN Apr 19 '23 at 15:41

1 Answers1

1

You can do this with the build in http module and streams.

const http = require("http");


const request = (options = {}) => {

    options = Object.assign({
        method: "GET"
    }, options);

    return http.request(options);

};


let download = request({
    url: "http://127.0.0.1:80",
    path: "/bytes/100"
});

// handle get/download response
download.on("response", (res) => {

    console.log("Response from download", res.headers);

    let upload = request({
        url: "http://127.0.0.1:80",
        path: "/anything/foo",
        method: "POST"
    });


    // handle post/upload response
    upload.on("response", (res) => {

        let chunks = [];

        res.on("data", (chunk) => {
            chunks.push(chunk)
        });

        res.on("end", () => {
            console.log("Body", Buffer.concat(chunks).toString())
        });

    });

    // pipe download to upload
    res.pipe(upload);

});

download.end();

I used the httpbin container from "kennethreitz". docker run -p 80:80 kennethreitz/httpbin: https://httpbin.org/

The example code above "downloads" 100 bytes, and pipe them to "/anything/foo", which response with debug information about the made request.

Example output:

Response from download {
  server: 'gunicorn/19.9.0',
  date: 'Thu, 20 Apr 2023 07:43:08 GMT',
  connection: 'close',
  'content-type': 'application/octet-stream',
  'content-length': '100',
  'access-control-allow-origin': '*',
  'access-control-allow-credentials': 'true'
}
Response from upload {
  server: 'gunicorn/19.9.0',
  date: 'Thu, 20 Apr 2023 07:43:08 GMT',
  connection: 'close',
  'content-type': 'application/json',
  'content-length': '456',
  'access-control-allow-origin': '*',
  'access-control-allow-credentials': 'true'
}
Body {
  "args": {}, 
  "data": "data:application/octet-stream;base64,KnyWTt134iwvCP8AHAx7eXfPrxpjZUuZMiqUI3y/PAemFqBmAGDZNI7IlP5oQ+pUjKYaKPXH3CjI0HeaSrGefPtztVsJh+R+BR8UaCQAzGCpyCS/fR34k26AnG4b+jK8D1A6vA==", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Connection": "close", 
    "Host": "localhost", 
    "Transfer-Encoding": "chunked"
  }, 
  "json": null, 
  "method": "POST", 
  "origin": "192.168.16.1", 
  "url": "http://localhost/anything/foo"
}

In the "upload response" you can see that the uploaded data is encoded as base64 for debugging purpose, but its the exact same data you received from "/bytes/100"

Example code for comparing download with upload buffer:

download.on("response", (res) => {

    console.log("Response from download", res.headers);

    let recvBuffer = Buffer.alloc(0);
    let sendBuffer = Buffer.alloc(0);

    let upload = request({
        url: "http://127.0.0.1:80",
        path: "/anything/foo",
        method: "POST"
    });

    let chunks = [];

    res.on("data", (chunk) => {
        chunks.push(chunk);
    });

    res.on("end", () => {
        recvBuffer = Buffer.concat(chunks);
    });

    // handle post/upload response
    upload.on("response", (res) => {

        console.log("Response from upload", res.headers)

        let chunks = [];

        res.on("data", (chunk) => {
            chunks.push(chunk)
        });

        res.on("end", () => {

            // pase response as json & extract received/send data
            let json = JSON.parse(Buffer.concat(chunks).toString());
            sendBuffer = Buffer.from(json.data.split(",")[1], "base64");

            console.log("Download = Upload:", Buffer.compare(recvBuffer, sendBuffer) === 0);

        });

    });

    // pipe download to upload
    res.pipe(upload);

});

Since it uses streams, the memory footprint is pretty low.

Note that the exact solution is depending on your targets that provide the download/uploads endpoints. But since you wanted a boilerplate this should be a good start for you.

Marc
  • 2,920
  • 3
  • 14
  • 30
  • Thanks @Marc for your answer! However, I think I am missing something. Replacing the adresses by valid URLs in the proposed code ends up in `error: connect ECONNREFUSED 127.0.0.1:80` in nodeJS (of course I am not using any local address) ? – redvivi Apr 24 '23 at 17:33
  • That means that nothing is running/listening on the address/port. I can only test it locally with a the http bin container, when you dont provide real URLs :/ – Marc Apr 24 '23 at 19:12
  • Hello @Marc. Just found the error - the correct option key is `hostname` instead of `url` :-) – redvivi Apr 26 '23 at 15:04
  • Lastly @Marc, given the line `Buffer.concat(chunks).toString()`, does it mean that the whole object is in memory? Say I upload 10GB, will it take 10GB in memory? – redvivi Apr 26 '23 at 15:07
  • @redvivi Yes, but this was just to validate/compare the download with the uploaded data. In production you dont need to do that. – Marc Apr 26 '23 at 15:50