1

BACKGROUND INFO: I'm using the .Net framework and MVC.

Here's my dilemma: I'm currently using a service to open a group of files (from a sql server). There's a delay in how long it takes to open the entire file(s) from the service directly proportional to the size of the file. I'm taking this file and then streaming it to a web browser from my web app. As you can imagine, this isn't very scalable since the browser times out for any file over about 500MB (since it takes too long before the start of streaming). So the solution we're using is called "chunking" of the data. I'm taking 64KB pieces of the data from the service and then streaming them right away to the browser.

This works great for a single file, however, we have a requirement that if there are multiple files, they need to be compressed into a single file. The problem with compression is that I need to download all of the files IN WHOLE from the service before I can start streaming the compressed package. I think I know the answer to this question, but I'll ask anyway: Is there a way to stream a group of files as they're being compressed? I highly doubt that you can since the compression algorithm would need to be able to see the files in whole. Alternatively, is there a JAVASCRIPT package out there that might be able to capture the files individually (as they're streaming) and then compress them once the streaming is done? I'd appreciate any advice on this!!

TheDude
  • 1,421
  • 4
  • 29
  • 54

1 Answers1

1

There seems to be a package out there for zipping on the client side, JSZip. Note you'd need Downloadify to then create the file on the user's computer. It doesn't look very cross-browser supported though, and the amount of data you're throwing around in JS on the client could cause issues.

Instead of sending a zip file, could you look at streaming a different archive format such as a TAR file or ISO file? It will just contain meta-data about the files and then the file data.

Alternatively, you could borrow a solution used by the 7digital and Bleep record music stores, which is to zip the files on the server to a temporary directory while presenting a page immediately to the user. The page uses a piece of JS on the client side to poll the server until the whole file is ready for download, then it can start that download as per normal.

Update

I noticed that if you download a directory from the DropBox website it starts the download immediately and does not know the full file size - which indicates that it's starting the download before it's finished creating the archive. A further read into the zip file format and the DEFLATE algorithm suggests that you can start generating your compressed data and streaming it to the client before you have the full file data from the service.

The code would look something like the following untested and simplified example: (using DotNetZip class names)

// Get a stream to the client
using (var zipStream = ZipOutputStream(Response.OutputStream)) {

foreach (var filename in filenames) {
     // Write file header
     ZipEntry entry = new ZipEntry(filename);
     zipStream.PutNextEntry(entry);

     // Write file chunks
     byte[] chunk;
     while ((chunk = service.GetChunk(filename)).Length > 0) {
         zipStream.Write(chunk, 0, chunk.Length);
     }
}

// Write zip file directory to complete file
zipStream.Finish();

}

If you want the files to be compressed further (which may be the case if you give the compressor larger blocks), but also want data streaming as soon as possible, and you know that data comes from the service to your application faster than it goes from your application to your client, you could implement some sort of exponential buffer within the foreach loop.

int chunksPerWrite = 1; // Better if this is defined outside of the foreach loop
byte[] chunk;
var chunks = new List<byte[]>();
while ((chunk = service.GetChunk(filename)).Length > 0) {
     chunks.Add(chunk)

     if (chunks.Count >= chunksPerWrite) {
         // Combine all the chunks with some array copying logic not included
         byte[] megaChunk = CombineAllChunks(chunks);
         zipStream.Write(megaChunk, 0, megaChunk.Length);
         chunksPerWrite *= 2; // or chunksPerWrite++ for a linear growth
     }
}

// Cut for brevity - combine any last chunks and send to the zipStream.

My reading of the ZIP specification suggests there would be a limit to how much data can be effectively compressed in a single go, but I can't work out what that limit is (it might depend on the data?). I would be very interested to hear from anyone who knows the spec better...

If you find you need to roll your own for some reason, Zip files also have a plain storage mechanism with no compression engine, making it much easier if you're not concerned by bandwidth.

Rob Church
  • 6,783
  • 3
  • 41
  • 46
  • @TheDude I've updated with more information about streaming zip files - you shouldn't need to download all the file data before sending compressed data to the client. – Rob Church Jul 16 '13 at 14:06
  • I appreciate the update. I will read up on those articles you linked. – TheDude Jul 16 '13 at 15:09
  • I'm actually downloading the file(s) from a separate service in 64KB chunks. If a file is divided into 50 64KB chunks and I zip those individually as I get them and stream it right away to the browser, the single PACKAGE.ZIP file will basically contain 50 separate zipped pieces. How does a single unzip algorithm deal with that? – TheDude Jul 16 '13 at 15:13
  • Looking at the zip file format, it won't matter if you stream the data in 64kb chunks, as long as you write a file header before each file and then correctly write the directory at the end. Take a look at http://stackoverflow.com/questions/4733707/zip-subfolders-using-zipoutputstream, in it the file header is written (the CRC and file header can be left blank according to the file spec on Wikipedia) then the file is added to the stream. If you add a header, you can then add extra data to the output stream in chunks and only add a new file header when it's time to add a new file. – Rob Church Jul 17 '13 at 08:37
  • So, essentially, you'd have to put a header in front of every single 64KB block to indicate that it's a part of a certain file and belongs to a certain directory. I have no idea how to do such a thing, but it would make sense since the unzip algorithm will need to know how to reconstruct all those small pieces. I'm curious. If such a thing is possible, why do Google Drive and Sky Drive not use such a process? Both of them do all the zipping ahead of time (when the files are whole) and then stream. It seems that if they can zip and stream right away, that might be faster. – TheDude Jul 17 '13 at 13:08
  • I've added some code to explain how I think you can get this to work. – Rob Church Jul 17 '13 at 14:06
  • Hey Rob, I appreciate your help. However, I don't think it will be very efficient to compress and stream small pieces of a file. Fundamentally, the compression algorithm needs to be able to see the entire file or files to be able to map out all the redundancies effectively. If it's only looking at small pieces at a time, it's missing all kinds of other redundancies outside of that small piece. – TheDude Jul 17 '13 at 17:03
  • Altered answer to address that issue, while still starting the download ASAP. TBH, I didn't think compression was an issue for you, I thought it was more an issue of downloading multiple files in a single go... – Rob Church Jul 18 '13 at 08:40
  • Yes, good compression is definitely preferred. But thanks for the info on how to combine the multiple files in one file without the compression. It was educational for me. – TheDude Jul 18 '13 at 16:11
  • 1
    There's a paid component that explicitly says it can do this: http://xceed.com/ZipRT_Net_Intro.html?adtype=152 – Rob Church Jul 31 '13 at 15:43
  • Hey Rob. Thanks for this awesome bit of advise! I tried the free trial of Xceed Real-Time Zip with our project and it worked like a charm! Do you know if there's a free version of this or is Xceed the only game in town? As awesome as it is, it's pretty expensive. I just want to make sure there's no open source alternative before taking the plunge. Thanks! – TheDude Aug 05 '13 at 04:04
  • Btw, I had just read your updated solution without compression. Would this free alternative be possible with compression? Also, would I need to include the DotNetZip library or is this possible with SevenZipSharp as well? – TheDude Aug 05 '13 at 04:40