4

I'm having some trouble with this, despite finding examples. I think it may be an encoding problem, but I'm just not sure. I am trying to programitally download a file from a https server, that uses cookies (and hence I'm using httpwebrequest). I'm debug printing the capacity of the streams to check, but the output [raw] files look different. Have tried other encoding to no avail.

Code:

    Sub downloadzip(strURL As String, strDestDir As String)

    Dim request As HttpWebRequest
    Dim response As HttpWebResponse

    request = Net.HttpWebRequest.Create(strURL)
    request.UserAgent = strUserAgent
    request.Method = "GET"
    request.CookieContainer = cookieJar
    response = request.GetResponse()

    If response.ContentType = "application/zip" Then
        Debug.WriteLine("Is Zip")
    Else
        Debug.WriteLine("Is NOT Zip: is " + response.ContentType.ToString)
        Exit Sub
    End If

    Dim intLen As Int64 = response.ContentLength
    Debug.WriteLine("response length: " + intLen.ToString)

    Using srStreamRemote As StreamReader = New StreamReader(response.GetResponseStream(), Encoding.Default)
        'Using ms As New MemoryStream(intLen)
        Dim fullfile As String = srStreamRemote.ReadToEnd

        Dim memstream As MemoryStream = New MemoryStream(New UnicodeEncoding().GetBytes(fullfile))

        'test write out to flie
        Dim data As Byte() = memstream.ToArray()
        Using filestrm As FileStream = New FileStream("c:\temp\debug.zip", FileMode.Create)
            filestrm.Write(data, 0, data.Length)
        End Using

        Debug.WriteLine("Memstream capacity " + memstream.Capacity.ToString)
        'Dim strData As String = srStreamRemote.ReadToEnd
        memstream.Seek(0, 0)
        Dim buffer As Byte() = New Byte(2048) {}
        Using zip As New ZipInputStream(memstream)
            Debug.WriteLine("zip stream cap " + zip.Length.ToString)
            zip.Seek(0, 0)
            Dim e As ZipEntry

            Dim flag As Boolean = True
            Do While flag ' daft, but won't assign e=zip... tries to evaluate
                e = zip.GetNextEntry
                If IsNothing(e) Then
                    flag = False
                    Exit Do
                Else
                    e.UseUnicodeAsNecessary = True
                End If

                If Not e.IsDirectory Then
                    Debug.WriteLine("Writing out " + e.FileName)
                    '    e.Extract(strDestDir)

                    Using output As FileStream = File.Open(Path.Combine(strDestDir, e.FileName), _
                                                          FileMode.Create, FileAccess.ReadWrite)
                        Dim n As Integer
                        Do While (n = zip.Read(buffer, 0, buffer.Length) > 0)
                            output.Write(buffer, 0, n)
                        Loop
                    End Using

                End If
            Loop
        End Using
        'End Using
    End Using 'srStreamRemote.Close()
    response.Close()
End Sub

So I get the right size file downloaded, but dotnetzip does not recognise it, and the files that get copied out are incomplete/invalid zips. I've spent most of today on this, and am ready to give up.

Mark
  • 61
  • 1
  • 6

3 Answers3

5

I think the answer will be to break down the problem, and perhaps change a couple aspects in the code.

For example, lets get rid of converting the response stream to a string:

Dim memStream As MemoryStream
Using rdr As System.IO.Stream = response.GetResponseStream
    Dim count = Convert.ToInt32(response.ContentLength)
    Dim buffer = New Byte(count) {}
    Dim bytesRead As Integer
    Do
        bytesRead += rdr.Read(buffer, bytesRead, count - bytesRead)
    Loop Until bytesRead = count
    rdr.Close()
    memStream = New MemoryStream(buffer)
End Using

Next, there's an easier way to output the contents of a memory stream to a file. Consider your code

Dim data As Byte() = memstream.ToArray()
Using filestrm As FileStream = New FileStream("c:\temp\debug.zip", FileMode.Create)
    filestrm.Write(data, 0, data.Length)
End Using

can be replaced with

Using filestrm As FileStream = New FileStream("c:\temp\debug.zip", FileMode.Create)
    memstream.WriteTo(filestrm)
End Using

That eliminates the need to transfer your memory stream into another byte array, and then push the byte array down the stream, when in fact the memory stream can transfer data directly to file (via the filestream) saving the middle-man buffer.

I'll admit I haven't worked with the Zip/compression libraries you're using, but with the above amendments you have removed unnecessary transfers between streams, byte arrays, strings, etc, and hopefully eliminated the encoding issues you were having.

Give that a try and let us know how you get on. Consider attempting to open the file that you saved ("C:\temp\debug.zip") to see if it is listed as corrupt. If not, then you know at least as far as that in the code, it is working ok.

Smudge202
  • 4,689
  • 2
  • 26
  • 44
  • Brilliant! I was getting very fustrated with the stream reader, I hadn't thought of just using the stream... seems obvious now. I now have a working zip file - with is written out as a copy for that very reason you suggest, to make sure it's valid. Shame the zip part is still coming up with error for the file header :( – Mark Jul 07 '11 at 16:28
  • Perhaps you can create a second question for the header exception you're getting. I'm heading home now but I'll take a look once there. Good luck! – Smudge202 Jul 07 '11 at 16:33
2

I thought I'd post my full working solution to my own question, it combines the two excellent replies I've had, thank you guys.

Sub downloadzip(strURL As String, strDestDir As String)
    Try

        Dim request As HttpWebRequest
        Dim response As HttpWebResponse

        request = Net.HttpWebRequest.Create(strURL)
        request.UserAgent = strUserAgent
        request.Method = "GET"
        request.CookieContainer = cookieJar
        response = request.GetResponse()

        If response.ContentType = "application/zip" Then
            Debug.WriteLine("Is Zip")
        Else
            Debug.WriteLine("Is NOT Zip: is " + response.ContentType.ToString)
            Exit Sub
        End If

        Dim intLen As Int32 = response.ContentLength
        Debug.WriteLine("response length: " + intLen.ToString)

        Dim memStream As MemoryStream
        Using stmResponse As IO.Stream = response.GetResponseStream()
            'Using ms As New MemoryStream(intLen)

            Dim buffer = New Byte(intLen) {}
            'Dim memstream As MemoryStream = New MemoryStream(buffer)

            Dim bytesRead As Integer
            Do
                bytesRead += stmResponse.Read(buffer, bytesRead, intLen - bytesRead)
            Loop Until bytesRead = intLen

            memStream = New MemoryStream(buffer)

            Dim res As Boolean = False
            res = ZipExtracttoFile(memStream, strDestDir)

        End Using 'srStreamRemote.Close()
        response.Close()



    Catch ex As Exception
        'to do :)
    End Try
End Sub


Function ZipExtracttoFile(strm As MemoryStream, strDestDir As String) As Boolean

    Try
        Using zip As ZipFile = ZipFile.Read(strm)
            For Each e As ZipEntry In zip

                e.Extract(strDestDir)

            Next
        End Using
    Catch ex As Exception
        Return False
    End Try

    Return True

End Function
Mark
  • 61
  • 1
  • 6
1

You can download into a MemoryStream, then examine it:

Public Sub Download(url as String)
    Dim req As HttpWebRequest = System.Net.WebRequest.Create(url)
    req.Method = "GET"
    Dim resp As HttpWebResponse = req.GetResponse()
    If resp.ContentType = "application/zip" Then
        Console.Error.Write("The result is a zip file.")
        Dim length As Int64 = resp.ContentLength
        If length = -1 Then
            Console.Error.WriteLine("... length unspecified")
            length = 16 * 1024
        Else
            Console.Error.WriteLine("... has length {0}", length)
        End If
        Dim ms As New MemoryStream
        CopyStream(resp.GetResponseStream(), ms)  '' **see note below!!!!
        '' list contents of the zip file
        ms.Seek(0,SeekOrigin.Begin)
        Using zip As ZipFile = ZipFile.Read (ms)
            Dim e As ZipEntry
            Console.Error.WriteLine("Entries:")
            Console.Error.WriteLine("  {0,22}  {1,10}  {2,12}", _
                                    "Name", "compressed", "uncompressed")
            Console.Error.WriteLine("----------------------------------------------------")
            For Each e In zip
                Console.Error.WriteLine("  {0,22}  {1,10}  {2,12}", _
                                        e.FileName, _
                                        e.CompressedSize, _
                                        e.UncompressedSize)
            Next
        End Using
    Else
        Console.Error.WriteLine("The result is Not a zip file.")
        CopyStream(resp.GetResponseStream(), Console.OpenStandardOutput)
    End If
End Sub


Private Shared Sub CopyStream(input As Stream, output As Stream)
    Dim buffer(32768 - 1) As Byte
    Dim n As Int32
    Do
        n = input.Read(buffer, 0, buffer.Length)
        If n = 0 Then Exit Do
            output.Write(buffer, 0, n)
    Loop
End Sub

EDIT

Just one note - I would not advise using this code (this approach) if the Zip file is very large. How large is "very large"? Well that depends, of course. The code I suggested above downloads the file into a memory stream, which of course means the entire contents of the zip file are held in memory. If it is a 28kb zip file, then there's no problem. But if it is a 2gb zip file, then you may have a big problem.

In that case you will want to stream it to a temporary file on disk, not to a MemoryStream. I'll leave that as an exercise for the reader.

The above will work for "reasonably sized" zip files, where "reasonable" depends on your machine configuration and application scenario.

Cheeso
  • 189,189
  • 101
  • 473
  • 713
  • Amazing, thank you. This structure is much neater, and works so well! – Mark Jul 15 '11 at 10:31
  • glad you liked it; I added a note just now regarding memory usage to my answer; you may want to check it out. – Cheeso Jul 15 '11 at 20:34