Unmarshalling Entities from multiple JSON arrays without using reflect or duplicating code

Question

I'm making an JSON API wrapper client that needs to fetch paginated results, where the URL to the next page is provided by the previous page. To reduce code duplication for the 100+ entities that share the same response format, I would like to have a single client method that fetches and unmarshalls the different entities from all paginated pages.

My current approach in a simplified (pseudo) version (without errors etc):

type ListResponse struct {
    Data struct {
        Results []interface{} `json:"results"`
        Next    string        `json:"__next"`
    } `json:"d"`
}

func (c *Client) ListRequest(uri string) listResponse ListResponse {
    // Do a http request to uri and get the body
    body := []byte(`{ "d": { "__next": "URL", "results": []}}`)
    json.NewDecoder(body).Decode(&listResponse)
}

func (c *Client) ListRequestAll(uri string, v interface{}) {
    a := []interface{}
    f := c.ListRequest(uri)
    a = append(a, f.Data.Results...)

    var next = f.Data.Next
    for next != "" {
        r := c.ListRequest(next)
        a = append(a, r.Data.Results...)
        next = r.Data.Next
    }

    b, _ := json.Marshal(a)
    json.Unmarshal(b, v)
}

// Then in a method requesting all results for a single entity
var entities []Entity1
client.ListRequestAll("https://foo.bar/entities1.json", &entities)

// and somewehere else
var entities []Entity2
client.ListRequestAll("https://foo.bar/entities2.json", &entities)

The problem however is that this approach is inefficient and uses too much memory etc, ie first Unmarshalling in a general ListResponse with results as []interface{} (to see the next URL and concat the results into a single slice), then marshalling the []interface{} for unmarshalling it directly aftwards in the destination slice of []Entity1.

I might be able to use the reflect package to dynamically make new slices of these entities, directly unmarshal into them and concat/append them afterwards, however if I understand correctly I better not use reflect unless strictly necessary...

How did you determine that this approach was inefficient and uses too much memory? What performance problems are you facing? Or is this just a guess? — Jonathan Hall, Aug 26 '18 at 15:18
@Flimzy Well if I made this method just for a single entity (ie Unmarshalling it directly into a response with correct Entities) it peeks at less than half of the memory in comparison to this sample above (for the time these requests last). Actually by watching the Activity Monitor on my mac. — nijm, Aug 26 '18 at 15:25
I'm not really sure what you're saying. In any case, trying to avoid reflection while using json is a bit of a lost cause, as json uses reflection internally. — Jonathan Hall, Aug 26 '18 at 15:26
@Flimzy I agree that doing some profiling and/or having some benchmarks would make the problem description easier and more abstract, however IMO the approach above has obvious flaws, aside from the actual results. Also I shouldn't have mentioned reflection as I don't think this should be used here and Joshua's approach below much better suits my needs ;) Anyway thanks for your input! — nijm, Aug 30 '18 at 09:22
@Flimzy My solution implicitly uses reflection via `encoding/json` but does not require the `reflect` package to be imported, and requires no explicit type assertions. — Joshua Kolden, Aug 30 '18 at 19:12

Joshua Kolden · Accepted Answer · 2018-08-30T06:54:52.273

Take a look at the RawMessage type in the encoding/json package. It allows you to defer the decoding of json values until later. For example:

Results []json.RawMessage `json:"results"`

or even...

Results json.RawMessage `json:"results"`

Since json.RawMessage is just a slice of bytes this will be much more efficient then the intermediate []interface{} you are unmarshalling to.

As for the second part on how to assemble these into a single slice given multiple page reads you could punt that question to the caller by making the caller use a slice of slices type.

// Then in a method requesting all results for a single entity
var entityPages [][]Entity1
client.ListRequestAll("https://foo.bar/entities1.json", &entityPages)

This still has the unbounded memory consumption problem your general design has, however, since you have to load all of the pages / items at once. You might want to consider changing to an Open/Read abstraction like working with files. You'd have some Open method that returns another type that, like os.File, provides a method for reading a subset of data at a time, while internally requesting pages and buffering as needed.

Perhaps something like this (untested):

type PagedReader struct {
  c *Client

  buffer []json.RawMessage

  next string
}

func (r *PagedReader) getPage() {
  f := r.c.ListRequest(r.next)
  r.next = f.Data.Next
  r.buffer = append(r.buffer, f.Data.Results...)
}

func (r *PagedReader) ReadItems(output []interface{}) int {
  for len(output) > len(buffer) && r.next != "" {
    r.getPage()
  }

  n := 0
  for i:=0;i<len(output)&&i< len(r.buffer);i++ {
    json.Unmarshal(r.buffer[i], output[i] )
    n++
  }
  r.buffer = r.buffer[n:]
  return n
}

Joshua, this is awesome! In the end I did opt for the `json.RawMessage`. However the `PagedReader` is a really interesting approach and some food for thought in refactoring. — nijm, Aug 30 '18 at 09:09

Unmarshalling Entities from multiple JSON arrays without using reflect or duplicating code

1 Answers1