9

Given the following file which holds a HTTP pipelined stream of HTTP requests and HTTP responses.

How can I parse this file into my stream variable?

type Connection struct{
   Request *http.Request
   Response *http.Response
}
stream := make([]Connection, 0)

The raw file:

GET /ubuntu/dists/trusty/InRelease HTTP/1.1
Host: archive.ubuntu.com
Cache-Control: max-age=0
Accept: text/*
User-Agent: Debian APT-HTTP/1.3 (1.0.1ubuntu2)

HTTP/1.1 404 Not Found
Date: Thu, 26 Nov 2015 18:26:36 GMT
Server: Apache/2.2.22 (Ubuntu)
Vary: Accept-Encoding
Content-Length: 311
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /ubuntu/dists/trusty/InRelease was not found on this server.</p>
<hr>
<address>Apache/2.2.22 (Ubuntu) Server at archive.ubuntu.com Port 80</address>
</body></html>
GET /ubuntu/dists/trusty-updates/InRelease HTTP/1.1
Host: archive.ubuntu.com
Cache-Control: max-age=0
Accept: text/*
User-Agent: Debian APT-HTTP/1.3 (1.0.1ubuntu2)

HTTP/1.1 200 OK
Date: Thu, 26 Nov 2015 18:26:37 GMT
Server: Apache/2.2.22 (Ubuntu)
Last-Modified: Thu, 26 Nov 2015 18:03:00 GMT
ETag: "fbb7-5257562a5fd00"
Accept-Ranges: bytes
Content-Length: 64439
Cache-Control: max-age=382, proxy-revalidate
Expires: Thu, 26 Nov 2015 18:33:00 GMT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Origin: Ubuntu
Label: Ubuntu
Suite: trusty-updates
Version: 14.04
Codename: trusty
[... truncated by author]

I know there is http.ReadRequest. What about the Response? Any ideas/feedback/thoughts are appreciated.

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
mattes
  • 8,936
  • 5
  • 48
  • 73

1 Answers1

15

It's actually pretty straightforward:

package main

import (
    "bufio"
    "bytes"
    "fmt"
    "io"
    "io/ioutil"
    "log"
    "net/http"
    "net/http/httputil"
    "os"
)

type Connection struct {
    Request  *http.Request
    Response *http.Response
}

func ReadHTTPFromFile(r io.Reader) ([]Connection, error) {
    buf := bufio.NewReader(r)
    stream := make([]Connection, 0)

    for {
        req, err := http.ReadRequest(buf)
        if err == io.EOF {
            break
        }
        if err != nil {
            return stream, err
        }

        resp, err := http.ReadResponse(buf, req)
        if err != nil {
            return stream, err
        }

        //save response body
        b := new(bytes.Buffer)
        io.Copy(b, resp.Body)
        resp.Body.Close()
        resp.Body = ioutil.NopCloser(b)

        stream = append(stream, Connection{Request: req, Response: resp})
    }
    return stream, nil

}
func main() {
    f, err := os.Open("/tmp/test.http")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()
    stream, err := ReadHTTPFromFile(f)
    if err != nil {
        log.Fatalln(err)
    }
    for _, c := range stream {
        b, err := httputil.DumpRequest(c.Request, true)
        if err != nil {
            log.Fatal(err)
        }
        fmt.Println(string(b))
        b, err = httputil.DumpResponse(c.Response, true)
        if err != nil {
            log.Fatal(err)
        }
        fmt.Println(string(b))
    }
}

A few notes:

  • There are http.ReadRequest and http.ReadResponse
  • http.ReadRequest and http.ReadResponse can be called over and over again on the same bufio.Reader until EOF and it will "just work"
    • "just working" depends on the Content-Length header being present and correct, so reading the body will place the Reader at the start of the next request/response
    • Read the code to understand exactly what will work and what won't
  • resp.Body must be Closeed per the docs, so we have to copy it to another buffer to keep it
  • Using your example data (modifying Content-Length to match your truncation), this code will output the same Requests and Responses as given
  • httputil.DumpRequest and httputil.DumpResponse won't necessarily dump the HTTP headers in the same order as the input file, so don't expect a diff to be perfect
Hasibul Hasn
  • 318
  • 5
  • 16
korylprince
  • 2,969
  • 1
  • 18
  • 27
  • This is great! Thank you so much. I must have missed the http.ReadResponse func. And I love the fact that calling the read func over and over again just works! – mattes Nov 30 '15 at 22:07
  • Yeah, I was surprised it worked also. But it's bascially working off Content-Length so it makes sense. Would probably be better if you had some kind of delimiter between requests/responses and used something like `io.LimitedReader` to make sure you don't get thrown off by a bad HTTP response. – korylprince Nov 30 '15 at 23:58
  • I keep getting 'unexpected EOF' – openwonk Dec 20 '19 at 00:40
  • Such solution won't work with HTTP/2, `ReadResponse` fails with "malformed HTTP version HTTP/2" – zzell Jun 15 '20 at 11:24
  • @zzell seems related to [this](https://github.com/golang/go/issues/18464) – korylprince Jun 16 '20 at 04:23
  • The answer is great! I just want to add that in my case it was needed to add one more line break to the end of the request text representation/file – Sasha Aug 19 '22 at 00:55