We're getting some very large .tar.gz files that we want to load into SQL Server (like 3 gig gzipped, 20 gig uncompressed).
I thought I would try to save both some time and disk space by writing a C# layer to decompress and stream to SQL Server on the fly using named pipes.
I found this: Can Sql Server BULK INSERT read from a named pipe/fifo?
and both answers had some "we don't know why this is necessary but it doesn't work otherwise" comments
- the need to open 2 pipes and
- the need to do it all in one big write to SQL Server
It's not really practical to do 1 big 20 gig write, so I was tinkering with the code in the above solutions.
What I found was that when I
byte[] buff = new byte[<pick your size here>];
int readlen;
while ((readlen = inputStream.Read(buff, 0, buff.Length)) > 0)
{
pipeStream.Write(buff, 0, readlen);
pipeStream.WaitForPipeDrain();
}
I get 2 writes in and I get a "Pipe is broken" error on the third pretty much no matter what the buffer size is.
Originally I tried
inputStream.CopyTo(pipeStream);
and by default that uses an 80k buffer. Looking at Position
, I could see that I'd be on the third write when it blew up.
Any idea what might be happening on the SQL Server side to break off after 2 reads?
EDIT: I did some playing around and found that with larger buffer sizes it would get farther. When I got above 1 meg buffers, I got almost up to 2 gig written to the pipe before it broke. Going above 5 meg buffers didn't improve things.
At first I thought the overhead of decompressing the zip might have had the reads lagging the writes, so Sql Server might have closed on a timed out read or something. I made the reads and writes async so I could get the next read going while the write was happening, but that didn't help.
I added some timing, and it turned out the pipe writes were taking 2.5 times as much time as the gzip reads, so I don't know why the pipe ultimately breaks.
Still almost 2 gig was a lot farther that 2-3 writes of a few hundred K