2

I'm using Azure's blockBlobURL.download() to download an image but am only receiving the top portion of the image. Is there a limit to how much I can download from Azure blob to a readable stream? The content length is 172628 and there exists a property highWaterMark: 16384. Are these to related?

async function compareToBaseline(imageData, blobName, metadata){

  const baselineBlobName = "MacOSX10.12/chrome/initial"

  const containerURL = ContainerURL.fromServiceURL(serviceURL, "baselines")
  const blockBlobURL = BlockBlobURL.fromContainerURL(containerURL, baselineBlobName );
  let baseLineImage = await blockBlobURL.download(aborter, 0)

  baseLineImage = baseLineImage.originalResponse.readableStreamBody.read()
  console.log(baseLineImage.length);

  baseLineImage = new Buffer(baseLineImage, 'base64');
  await fs.writeFile('./newest.png', baseLineImage, 'binary', function(err){
    console.log('written');
  })
}

The result is only the top portion of an image.

Nick Lee
  • 842
  • 2
  • 11
  • 27
  • 1
    I can't write a proper answer right now, but `read()` only returns a chunk of data, you need to loop until it returns `null`, see [docs](https://nodejs.org/api/stream.html#stream_readable_read_size) – user247702 Jun 05 '19 at 19:54
  • Would I be able to loop over that data and keeping appending the data to a variable and eventually write that data to the filesystem? Or would I need to pipe it continuously to the filesystem? If I understand correctly. – Nick Lee Jun 05 '19 at 20:11
  • Ideally you'd pipe it to the file system instead of first putting the data in a variable, in order to limit memory consumption. In this case probably you won't even need to call `read()` in a loop, I assume `fs` supports writing a stream straight to a file. – user247702 Jun 05 '19 at 20:49
  • `read()` returns `null` doesn't mean there is no more data in the stream. You need to follow Node.js practices to get all data by listening to `data` or `readable` event. For example, `readable.on('readable', () => { let chunk; while (null !== (chunk = readable.read())) { console.log(`Received ${chunk.length} bytes of data.`); } });` – Xiaoning Liu - MSFT Jun 13 '19 at 08:44

3 Answers3

1

There's a 4-MB limit for each call to the Azure Storage service. If your file is larger than 4 MB, you must break it in chunks. For more information, see Azure Storage scalability and performance targets.

Here is sample c# code to download very large files in 1MB chunks. it's performance oriented too.

 private static void DownloadLargeFile()
        {
            string connectionString = "connString"; //ConfigurationSettings.AppSettings["StorageConnectionString"]; //blob connection string
#pragma warning restore CS0618 // Type or member is obsolete
#pragma warning disable CS0618 // Type or member is obsolete
            string sourceContainerName = "quickstartblob"; //ConfigurationSettings.AppSettings["sourcecontainerName"]; //source blob container name            
#pragma warning restore CS0618 // Type or member is obsolete
            string sourceBlobFileName = "QuickStart1.txt"; //source blob name
            CloudStorageAccount account = CloudStorageAccount.Parse(connectionString);
            var blobClient = account.CreateCloudBlobClient();
            var container = blobClient.GetContainerReference(sourceContainerName);
            var file = sourceBlobFileName;
            var blob = container.GetBlockBlobReference(file);
            //First fetch the size of the blob. We use this to create an empty file with size = blob's size
            blob.FetchAttributes();
            var blobSize = blob.Properties.Length;
            long blockSize = (1 * 1024 * 1024);//1 MB chunk;
            blockSize = Math.Min(blobSize, blockSize);
            //Create an empty file of blob size
            using (FileStream fs = new FileStream(file, FileMode.Create))//Create empty file.
            {
                fs.SetLength(blobSize);//Set its size
            }
            var blobRequestOptions = new BlobRequestOptions
            {
                RetryPolicy = new ExponentialRetry(TimeSpan.FromSeconds(5), 3),
                MaximumExecutionTime = TimeSpan.FromMinutes(60),
                ServerTimeout = TimeSpan.FromMinutes(60)
            };
            long currentPointer = 0;
            long bytesRemaining = blobSize;
            do
            {
                var bytesToFetch = Math.Min(blockSize, bytesRemaining);
                using (MemoryStream ms = new MemoryStream())
                {
                    //Download range (by default 1 MB)
                    blob.DownloadRangeToStream(ms, currentPointer, bytesToFetch, null, blobRequestOptions);
                    ms.Position = 0;
                    var contents = ms.ToArray();
                    using (var fs = new FileStream(file, FileMode.Open))//Open that file
                    {
                        fs.Position = currentPointer;//Move the cursor to the end of file.
                        fs.Write(contents, 0, contents.Length);//Write the contents to the end of file.
                    }
                    currentPointer += contents.Length;//Update pointer
                    bytesRemaining -= contents.Length;//Update bytes to fetch
                }
            }
            while (bytesRemaining > 0);
        }

Something like below in node js

var azure = require('azure-storage');
var fs = require('fs');

module.exports = function (context, input) {

context.done();

var accessKey = 'myaccesskey';
var storageAccount = 'mystorageaccount';
var containerName = 'mycontainer';

var blobService = azure.createBlobService(storageAccount, accessKey);

var recordName = "a_large_movie.mov";
var blobName = "standard/mov/" + recordName;

var blobSize;
var chunkSize = (1024 * 512) * 8; // I'm experimenting with this variable
var startPos = 0;
var fullPath = "D:/home/site/wwwroot/myAzureFunction/input/";
var blobProperties = blobService.getBlobProperties(containerName, blobName,   null, function (error, blob) {
    if (error) {
        throw error;
    }
    else    {
        blobSize = blob.contentLength;
        context.log('Registered length: ' + blobSize);
        fullPath = fullPath + recordName;
        console.log(fullPath);
        doDownload();
    }
}
);

function doDownload() {
var stream = fs.createWriteStream(fullPath, {flags: 'a'});
var endPos = startPos + chunkSize;
if (endPos > blobSize) {
    endPos = blobSize;
    context.log('Reached end of file endPos: ' + endPos);
}

context.log("Downloading " + (endPos - startPos) + " bytes starting from " + startPos + " marker.");

blobService.getBlobToStream(
    containerName, 
    blobName, 
    stream, 
    { 
        "rangeStart": startPos, 
        "rangeEnd": endPos-1 
    }, 
    function(error) {
        if (error) {
            throw error;
        }
        else if (!error) {
            startPos = endPos;
            if (startPos <= blobSize - 1) {
                doDownload();
            }
        }
    }
);
}

};

Hope it helps.

Mohit Verma
  • 5,140
  • 2
  • 12
  • 27
0

blockBlobURL.download() doesn't have a limit to file size. But read() returns null doesn't mean no more data in the stream. You need to follow Node.js practices to get all data by listening to data or readable event.

For example, the data event posted by Peter Pan. Or the readable event posted by Node.js official documents:

readable.on('readable', () => {
  let chunk;
  while (null !== (chunk = readable.read())) {
    console.log(`Received ${chunk.length} bytes of data.`);
  }
});

Please always call read() inside readable event callback.

-1

It seems that this issue was similar with your other thread Unable to read readableStreamBody from downloaded blob.

Here is my function to help saving the baseLineImage.readableStreamBody to a file, as below.

async function streamToFs(filename, readableStream) {
    const ws = fs.createWriteStream(filename);
    readableStream.on("data", data => {
      ws.write(data);
    }).on("end", () => {
      console.log('written');
    });
}

And change your code as below.

async function compareToBaseline(imageData, blobName, metadata){

  const baselineBlobName = "MacOSX10.12/chrome/initial"

  const containerURL = ContainerURL.fromServiceURL(serviceURL, "baselines");
  const blockBlobURL = BlockBlobURL.fromContainerURL(containerURL, baselineBlobName );
  let baseLineImage = await blockBlobURL.download(aborter, 0);

  await streamToFs('./newest.png', baseLineImage.readableStreamBody);
}

It works. Hope it helps.

Peter Pan
  • 23,476
  • 4
  • 25
  • 43