42

Given a JSON stream of the following form:

{ "a": 10, "b": 11 } { "a": 20, "b": 21 } { "a": 30, "b": 31 }

I would like to sum the values in each of the objects and output a single object, namely:

{ "a": 60, "b": 63 }

I'm guessing this will probably require flattening the above list of objects into a an array of [name, value] pairs and then summing the values using reduce but the documentation of the syntax for using reduce is woeful.

peak
  • 105,803
  • 17
  • 152
  • 177
Alan Burlison
  • 1,022
  • 1
  • 9
  • 16

3 Answers3

40

Unless your jq has inputs, you will have to slurp the objects up using the -s flag. Then you'll have to do a fair amount of manipulation:

  1. Each of the objects needs to be mapped out to key/value pairs
  2. Flatten the pairs to a single array
  3. Group up the pairs by key
  4. Map out each group accumulating the values to a single key/value pair
  5. Map the pairs back to an object
map(to_entries)
    | add
    | group_by(.key)
    | map({
          key: .[0].key,
          value: map(.value) | add
      })
    | from_entries

With jq 1.5, this could be greatly improved: You can do away with slurping and just read the inputs directly.

$ jq -n '
reduce (inputs | to_entries[]) as {$key,$value} ({}; .[$key] += $value)
' input.json

Since we're simply accumulating all the values in each of the objects, it'll be easier to just run through the key/value pairs of all the inputs, and add them all up.

peak
  • 105,803
  • 17
  • 152
  • 177
Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • Ah, the group_by is what I was missing, I'd overlooked it in the documentation. Also, I got the tool producing the source data to output as a single JSON stream so '-s' wasn't needed. The only other tweak I made was to use 'flatten' instead of 'add'. Thanks for the answer, it was spot on :-) – Alan Burlison Feb 12 '15 at 22:00
  • Reduce is awesome! I'm struggling with how it might support 2 methods (sum and avg for isntance) It doesnt like sem-colons, commas seem to run the last manipulation only reduce .usage."os:linux"[] as $item ( {"credits":0,"minutes":0}; ."credits" += $item.credits; ."minutes" += $item.amount /1000/60 ) – Eddie Nov 21 '18 at 19:55
  • 1
    @Eddie: semicolons are used for separating parameters to function calls. The way you're using it there, it should be pipes and it would work like you would expect. – Jeff Mercado Nov 22 '18 at 07:48
  • Shouldn't `jq -s 'reduce (.[] | to_entries[]) ...` work just as well as `jq -n 'reduce (inputs | to_entries[]) ...`? I'm missing the need for the mapping, `group_by`, etc. Or does lacking `inputs` imply a jq version lacking other constructs in the `-n` example? – Watson Sep 16 '20 at 12:10
  • For those working with a stream already in jq, make sure to use `[ ... ]` around the stream before `map(to_entries)` to convert it to an array. – ggorlen Aug 05 '21 at 02:55
  • @ggorlen, if you're going to stream it, may as well write it as `[... | to_entries]` – Jeff Mercado Aug 05 '21 at 03:16
  • Finally found a compelling use for reduce! Thanks @JeffMercado. I was trying to add a total key in the same iteration using the group_by method and getting very confused. With reduce it's trivial to manipulate arbitrary keys on each iteration. – Iain Samuel McLean Elder Aug 18 '21 at 14:28
17

I faced the same question when listing all artifacts from GitHub (see here for details) and want to sum their size.

curl https://api.github.com/repos/:owner/:repo/actions/artifacts \
     -H "Accept: application/vnd.github.v3+json" \
     -H "Authorization:  token <your_pat_here>" \
     | jq '.artifacts | map(.size_in_bytes) | add'

Input:

{
  "total_count": 3,
  "artifacts": [
    {
      "id": 0000001,
      "node_id": "MDg6QXJ0aWZhY3QyNzUxNjI1",
      "name": "artifact-1",
      "size_in_bytes": 1,
      "url": "https://api.github.com/repos/:owner/:repo/actions/artifacts/2751625",
      "archive_download_url": "https://api.github.com/repos/:owner/:repo/actions/artifacts/2751625/zip",
      "expired": false,
      "created_at": "2020-03-10T18:21:23Z",
      "updated_at": "2020-03-10T18:21:24Z"
    },
    {
      "id": 0000002,
      "node_id": "MDg6QXJ0aWZhY3QyNzUxNjI0",
      "name": "artifact-2",
      "size_in_bytes": 2,
      "url": "https://api.github.com/repos/:owner/:repo/actions/artifacts/2751624",
      "archive_download_url": "https://api.github.com/repos/:owner/:repo/actions/artifacts/2751624/zip",
      "expired": false,
      "created_at": "2020-03-10T18:21:23Z",
      "updated_at": "2020-03-10T18:21:24Z"
    },
    {
      "id": 0000003,
      "node_id": "MDg6QXJ0aWZhY3QyNzI3NTk1",
      "name": "artifact-3",
      "size_in_bytes": 3,
      "url": "https://api.github.com/repos/docker/mercury-ui/actions/artifacts/2727595",
      "archive_download_url": "https://api.github.com/repos/:owner/:repo/actions/artifacts/2727595/zip",
      "expired": false,
      "created_at": "2020-03-10T08:46:08Z",
      "updated_at": "2020-03-10T08:46:09Z"
    }
  ]
}

Output:

6
Oleg Burov
  • 1,190
  • 11
  • 21
10

Another approach, which illustrates the power of jq quite nicely, is to use a filter named "sum" defined as follows:

def sum(f): reduce .[] as $row (0; . + ($row|f) );

To solve the particular problem at hand, one could then use the -s (--slurp) option as mentioned above, together with the expression:

{"a": sum(.a), "b": sum(.b) }  # (2)

The expression labeled (2) only computes the two specified sums, but it is easy to generalize, e.g. as follows:

# Produce an object with the same keys as the first object in the 
# input array, but with values equal to the sum of the corresponding
# values in all the objects.
def sumByKey:
  . as $in
  | reduce (.[0] | keys)[] as $key
    ( {}; . + {($key): ($in | sum(.[$key]))})
;
peak
  • 105,803
  • 17
  • 152
  • 177