-2

I need to TarZip(.tar.gz) multiple CSV files that are present in Azure blob storageV2 inside container named input and save resultant file in another container output using SharpZipLib library in C# using Azure function. File Size of CSV's could be up to 3 GB of a single file.

It worked by downloading files from blob in working directory and then tar zip and uploading the tar zip file in blob. I want to do it with without downloading it directly tarzipping on Blob. As files size is much higher around 4 GB.


        public static async void tar()
        {
            // define blobs you need to use
            string connectionString = "XXXX";
            string Azure_container_name = "input";
            List<string> blobs = ListAllBlobsName(connectionString, Azure_container_name);
            
            
            BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
            var sourceContainer = blobServiceClient.GetBlobContainerClient("input");
            var desContainer = blobServiceClient.GetBlobContainerClient("output");
            var desBlob = desContainer.GetBlockBlobClient("file.tar.gz");
            var options = new BlockBlobOpenWriteOptions
            {
                HttpHeaders = new BlobHttpHeaders
                {
                    ContentType = MimeMapping.GetMimeMapping("file.tar.gz"),
                },
            };
            using (var outStream = desBlob.OpenWriteAsync(true, options).GetAwaiter().GetResult())
            using (TarOutputStream tarOutputStream = new TarOutputStream(outStream, Encoding.UTF8))
            {

                foreach (var blob in blobs)
                {
                    var source = sourceContainer.GetBlobClient(blob);
                    Console.WriteLine("Adding file "+blob + " in tar zip");
                    Azure.Storage.Blobs.Models.BlobProperties properties = source.GetPropertiesAsync().GetAwaiter().GetResult();
                    var entry = TarEntry.CreateTarEntry(blob);
                    entry.Size = properties.ContentLength;
                    tarOutputStream.PutNextEntry(entry);
                     source.DownloadToAsync(tarOutputStream).GetAwaiter().GetResult();
                    tarOutputStream.CloseEntry();
                    Console.WriteLine("Added file " + blob + " in tar zip");
                    Console.WriteLine();
                }
                tarOutputStream.Finish();
                tarOutputStream.Close();
            }
        }

        

1 Answers1

0

Regarding the issue, please refer to the following code

 // define blobs you need to use
            string[] blobs = { "",... };
            string connectionString = "";
            BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
            var sourceContainer = blobServiceClient.GetBlobContainerClient("input");
            var desContainer = blobServiceClient.GetBlobContainerClient("output");
            var desBlob= desContainer.GetBlockBlobClient( "csv.tar.gz");
            var options = new BlockBlobOpenWriteOptions {
                HttpHeaders = new BlobHttpHeaders {
                    ContentType = MimeMapping.GetMimeMapping("csv.tar.gz"),
                },
            };

            using (var outStream = await desBlob.OpenWriteAsync(true, options).ConfigureAwait(false))
            using (TarOutputStream tarOutputStream = new TarOutputStream(outStream, Encoding.UTF8)) {
                
                foreach (var blob in blobs) {
                    var source =sourceContainer.GetBlobClient(blob);

                    BlobProperties properties = await source.GetPropertiesAsync().ConfigureAwait(false);
                    var entry = TarEntry.CreateTarEntry(blob);
                    entry.Size = properties.ContentLength;
                    tarOutputStream.PutNextEntry(entry);
                    await source.DownloadToAsync(tarOutputStream);
                    tarOutputStream.CloseEntry();
                }
                tarOutputStream.Finish();
                tarOutputStream.Close();
            }
Jim Xu
  • 21,610
  • 2
  • 19
  • 39
  • I tried the code but it throws error: Entry closed at '0' before the '919379' bytes specified in the header were written. I am trying to TarZip two files one of size 10 MB and and second is of 2 MB. I have updated code as below: using (var outStream = desBlob.OpenWriteAsync(true, options).GetAwaiter().GetResult()) and BlobProperties properties = source.GetPropertiesAsync().GetAwaiter().GetResult(); – Santosh Jindal Nov 19 '20 at 08:05
  • @SantoshJindal please refer to https://github.com/icsharpcode/SharpZipLib/blob/master/src/ICSharpCode.SharpZipLib/Tar/TarOutputStream.cs#L337. It seems that you cannot read all bytes from Azure blob. – Jim Xu Nov 19 '20 at 08:10
  • @SantoshJindal According to the situation, you can try to use the synchronization method `downloadTo()` to download contents into your tar stream. – Jim Xu Nov 20 '20 at 01:18
  • Many thanks!! Error goes out. But another issue i am facing is whole process never ends for a CSV's of size ranging from 1 KB to 3 GB. It successfully added 2 files of 2 KB and 2 MB and when started working on 1 GB file it is running since last 1.5 hours and increasing csv.tar.gz file space every second but never ends. Thanks!! – Santosh Jindal Nov 20 '20 at 10:06
  • @SantoshJindal Since your file is very big, I suggest you download blob in chunk with method `Download(HttpRange, BlobRequestConditions, Boolean, CancellationToken)` then use `tarOutputStream.write()` to write every chunk to tar file. If so, we can track the progress. Besides, could you please provide you code? – Jim Xu Nov 23 '20 at 06:13
  • I have added my piece of code in original question asked. Thanks!! -@Jim Xu – Santosh Jindal Nov 23 '20 at 08:49