16

I wrote a little web crawler and had known that the Response is a zip file.
In my limited experience with golang programing, I only know how to unzip a existing file.
Can I unzip the Response.Body in memory without saving it in hard disk in advance?

Minusy
  • 267
  • 1
  • 3
  • 11

1 Answers1

35

Updating answer for handling Zip file response body in-memory.

Note: Ensure you have sufficient memory for handling zip file.

package main

import (
    "archive/zip"
    "bytes"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
)

func main() {
    resp, err := http.Get("zip file url")
    if err != nil {
        log.Fatal(err)
    }
    defer resp.Body.Close()

    body, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        log.Fatal(err)
    }

    zipReader, err := zip.NewReader(bytes.NewReader(body), int64(len(body)))
    if err != nil {
        log.Fatal(err)
    }

    // Read all the files from zip archive
    for _, zipFile := range zipReader.File {
        fmt.Println("Reading file:", zipFile.Name)
        unzippedFileBytes, err := readZipFile(zipFile)
        if err != nil {
            log.Println(err)
            continue
        }

        _ = unzippedFileBytes // this is unzipped file bytes
    }
}

func readZipFile(zf *zip.File) ([]byte, error) {
    f, err := zf.Open()
    if err != nil {
        return nil, err
    }
    defer f.Close()
    return ioutil.ReadAll(f)
}

By default Go HTTP client handles Gzip response automatically. So do typical read and close of response body.

However there is a catch in it.

// Reference https://github.com/golang/go/blob/master/src/net/http/transport.go
//
// DisableCompression, if true, prevents the Transport from
// requesting compression with an "Accept-Encoding: gzip"
// request header when the Request contains no existing
// Accept-Encoding value. If the Transport requests gzip on
// its own and gets a gzipped response, it's transparently
// decoded in the Response.Body. However, if the user
// explicitly requested gzip it is not automatically
// uncompressed.
DisableCompression bool

What it means is; If you add a header Accept-Encoding: gzip manually in the request then you have to handle Gzip response body by yourself.

For Example -

reader, err := gzip.NewReader(resp.Body)
if err != nil {
    log.Fatal(err)
}
defer reader.Close()

body, err := ioutil.ReadAll(reader)
if err != nil {
    log.Fatal(err)
}

fmt.Println(string(body))
jeevatkm
  • 4,571
  • 1
  • 23
  • 24
  • Thanks for your answer. I know how to handle gzip. But I don't know handle .zip file. It may need archive/zip, but I don't know how to use archive/zip unzip .zip Response.Body. Is gzip the same as zip? – Minusy May 26 '18 at 05:14
  • No gzip != zip. It seems you asked for `.zip` file, my bad. I will update the code snippet in a while. – jeevatkm May 26 '18 at 05:31