2

I have got the following array of objects (this is just an excerpt, also the objects are bigger):

[{
    "DATE": "10.10.2017 01:00",
    "ID": "X",
    "VALUE_ONE": 20,
    "VALUE_TWO": 5
  },
  {
    "DATE": "10.10.2017 02:00",
    "ID": "X",
    "VALUE_ONE": 30,
    "VALUE_TWO": 7
  },
  {
    "DATE": "10.10.2017 03:00",
    "ID": "X",
    "VALUE_ONE": 25,
    "VALUE_TWO": 2
  },

  {
    "DATE": "10.10.2017 01:00",
    "ID": "Y",
    "VALUE_ONE": 10,
    "VALUE_TWO": 9
  },
  {
    "DATE": "10.10.2017 02:00",
    "ID": "Y",
    "VALUE_ONE": 20,
    "VALUE_TWO": 5
  },
  {
    "DATE": "10.10.2017 03:00",
    "ID": "Y",
    "VALUE_ONE": 50,
    "VALUE_TWO": 5
  },

  {
    "DATE": "10.10.2017 01:00",
    "ID": "Z",
    "VALUE_ONE": 55,
    "VALUE_TWO": 3
  },
  {
    "DATE": "10.10.2017 02:00",
    "ID": "Z",
    "VALUE_ONE": 60,
    "VALUE_TWO": 7
  },
  {
    "DATE": "10.10.2017 03:00",
    "ID": "Z",
    "VALUE_ONE": 15,
    "VALUE_TWO": 7
  }
]

To simplify this for a web application, and also to reduce file size, I would like to convert the "VALUE_ONE","VALUE_TWO" and "DATE" values to arrays for each "ID" just like this:

[{
    "DATE": ["10.10.2017 01:00", "10.10.2017 02:00", "10.10.2017 03:00"],
    "ID": "X",
    "VALUE_ONE": [20, 30, 25],
    "VALUE_TWO": [5, 7, 2]
  },
  {
    "DATE": ["10.10.2017 01:00", "10.10.2017 02:00", "10.10.2017 03:00"],
    "ID": "Y",
    "VALUE_ONE": [10, 20, 50],
    "VALUE_TWO": [9, 5, 5]
  },
  {
    "DATE": ["10.10.2017 01:00", "10.10.2017 02:00", "10.10.2017 03:00"],
    "ID": "Z",
    "VALUE_ONE": [55, 60, 15],
    "VALUE_TWO": [3, 7, 7]
  }
]

Here it is important that you need to be able find the values that are linked to a certain time (date). As the input values for "DATE" are consecutive, you most probably do not need the DATE value anymore to find the requested "VALUE.." value. You can probably just use the index of the array for that (index=0 is always 10.10.2017 01:00, index=1 is ... 02:00 etc.). Is it possible to do it like that? This would keep the file size even smaller. Thanks!

peak
  • 105,803
  • 17
  • 152
  • 177

4 Answers4

1

With 2-step reduce(it doesn't look beautiful but works):

jq 'reduce group_by(.ID)[] as $a ([]; . + [ reduce $a[] as $o 
   ({"DATE":[],"VALUE_ONE":[],"VALUE_TWO":[]}; 
    .DATE |= .+ [$o.DATE] | .ID = $o.ID |.VALUE_ONE |= .+ [$o.VALUE_ONE] 
    | .VALUE_TWO |= .+ [$o.VALUE_TWO]) ] )' input.json

The output:

[
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "VALUE_ONE": [
      20,
      30,
      25
    ],
    "VALUE_TWO": [
      5,
      7,
      2
    ],
    "ID": "X"
  },
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "VALUE_ONE": [
      10,
      20,
      50
    ],
    "VALUE_TWO": [
      9,
      5,
      5
    ],
    "ID": "Y"
  },
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "VALUE_ONE": [
      55,
      60,
      15
    ],
    "VALUE_TWO": [
      3,
      7,
      7
    ],
    "ID": "Z"
  }
]
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
0

The following solution avoids group_by for two reasons:

  • efficiency
  • the sort used by group_by in jq version 1.5 might not be stable, which complicates things.

Instead we use bucketize defined as follows:

def bucketize(f): reduce .[] as $x ({}; .[$x|f] += [$x] );

To keep things simple, we will also define the following helper function:

# compactify an array with a single ID
def compact:
  . as $in
  | reduce (.[0]|keys_unsorted[]) as $key ({};
      . + {($key): $in|map(.[$key])})
    + {"ID": .[0].ID}
    ;

Solution

[bucketize(.ID)[] | compact]

This will ensure things are OK even if the set of dates differ across IDs, and even if the JSON objects are not grouped initially by date.

(If you want to drop "DATE" altogether in the final results, then replace the call to compact by compact | del(.DATE) in the line above.)

Output

[
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "ID": "X",
    "VALUE_ONE": [
      20,
      30,
      25
    ],
    "VALUE_TWO": [
      5,
      7,
      2
    ]
  },
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "ID": "Y",
    "VALUE_ONE": [
      10,
      20,
      50
    ],
    "VALUE_TWO": [
      9,
      5,
      5
    ]
  },
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "ID": "Z",
    "VALUE_ONE": [
      55,
      60,
      15
    ],
    "VALUE_TWO": [
      3,
      7,
      7
    ]
  }
]
peak
  • 105,803
  • 17
  • 152
  • 177
0

Here is a solution using reduce, setpath, getpath, del and symbolic variable destructuring. It will collect all the values for keys other than ID and DATE (eliminating the need to hardcode VALUE_ONE, etc.) in parallel arrays.

reduce (.[] | [.ID, .DATE, del(.ID,.DATE)]) as [$id,$date,$v] ({};
    (getpath([$id, "DATE"])|length) as $idx
  | setpath([$id, "ID"]; $id)
  | setpath([$id, "DATE", $idx]; $date)
  | reduce ($v|keys[]) as $k (.; setpath([$id, $k, $idx]; $v[$k]))
)
| map(.)

Try it online!

jq170727
  • 13,159
  • 3
  • 46
  • 56
0

If your data set is small enough, you could just group them up by id and map to the desired results. It won't be super efficient compared to a streaming solution, but will be the simplest to implement using builtins.

group_by(.ID) | map({
    DATE: map(.DATE),
    ID: .[0].ID,
    VALUE_ONE: map(.VALUE_ONE),
    VALUE_TWO: map(.VALUE_TWO)
})
Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • `group_by` will only produce correct results if the underlying sort is stable, which is not always the case with jq 1.5 and earlier. – peak Nov 14 '17 at 00:20