0

I have a json formatted overview of backups, generated using pgbackrest. For simplicity I removed a lot of clutter so the main structures remain. The list can contain multiple backup structures, I reduced here to just 1 for simplicity.

[
  {
    "backup": [
      {
        "archive": {
          "start": "000000090000000200000075",
          "stop": "000000090000000200000075"
        },
        "info": {
          "size": 1200934840
        },
        "label": "20220103-122051F",
        "type": "full"
      },
      {
        "archive": {
          "start": "00000009000000020000007D",
          "stop": "00000009000000020000007D"
        },
        "info": {
          "size": 1168586300
        },
        "label": "20220103-153304F_20220104-081304I",
        "type": "incr"
      }
    ],
    "name": "dbname1"
  }
]

Using jq I tried to generate a simpeler format out of this, until now without any luck.

What I would like to see is the backup.archive, backup.info, backup.label, backup.type, name combined in one simple structure, without getting into a cartesian product. I would be very happy to get the following output:

[
  {
    "backup": [
      {
        "archive": {
          "start": "000000090000000200000075",
          "stop": "000000090000000200000075"
        },
        "name": "dbname1",
        "info": {
          "size": 1200934840
        },
        "label": "20220103-122051F",
        "type": "full"
      },
      {
        "archive": {
          "start": "00000009000000020000007D",
          "stop": "00000009000000020000007D"
        },
        "name": "dbname1",
        "info": {
          "size": 1168586300
        },
        "label": "20220103-153304F_20220104-081304I",
        "type": "incr"
      }
    ]
  }
]

where name is redundantly added to the list. How can I use jq to convert the shown input to the requested output? In the end I just want to generate a simple csv from the data. Even with the simplified structure using

'.[].backup[].name + ":" + .[].backup[].type'

I get a cartesian product:

"dbname1:full"
"dbname1:full"
"dbname1:incr"
"dbname1:incr"

how to solve that?

2 Answers2

0

So, for each object in the top-level array you want to pull in .name into each of its .backup array's elements, right? Then try

jq 'map(.backup[] += {name} | del(.name))'

Demo

Then, generating a CSV output using jq is easy: There is a builtin called @csv which transforms an array into a string of its values with quotes (if they are stringy) and separated by commas. So, all you need to do is to iteratively compose your desired values into arrays. At this point, removing .name is not necessary anymore as we are piecing together the array for CSV output anyway. And we're giving the -r flag to jq in order to make the output raw text rather than JSON.

jq -r '.[]
  | .backup[] + {name}
  | [(.archive | .start, .stop), .name, .info.size, .label, .type]
  | @csv
'

Demo

pmf
  • 24,478
  • 2
  • 22
  • 31
  • Thanks, that was quick. My idea was to generate a simple csv from this in the next stap. I tried with '.[].backup[].name + ":" + .[].backup[].type' which obviously is wrong since I get 4 lines of output instead of the expected 2. Any idea on the finishing touch? –  Jan 05 '22 at 14:15
  • @ik_zelf Updated my answer to reflect your question regarding CSV. – pmf Jan 05 '22 at 17:03
  • Thanks very much @pmf, I learned a lot from you. I had seen many more answers but you also showed me how to address nested structures. Thanks a lot, I am happily accepting your answer. Have a beer ;-) –  Jan 05 '22 at 18:13
  • sorry for bothering again @pmf .... how can I apply max_by on archive.stop ? I tried jq -r '.[] | .backup[-1] + {name} | max_by( .archive.stop ) ' and get Cannot index string with string "archive" I must be simple ..... but it seems I am not :-( –  Jan 06 '22 at 12:46
  • `.[] | .backup[-1]` gives you (due to the initial `.[]`) a stream of objects and any subsequent filter will run as many times, each seeing exactly one of the objects output. But `max_by` needs an array from which it can pick one. The simplest way to collect the stream into an array is embracing the expression with brackets `[.[] | .backup[-1]] | max…` but as you are decomposing the initial array in the first place, you could also directly apply `map` to it `map(.backup[-1]) | max…`. Note: I left out `{name}` for brevity here; add it whereever you need it, you know how to do it now :) – pmf Jan 06 '22 at 14:09
  • Yet another thing: `.archive.stop` (as in the sample data) is a string. You *can* compare strings using `max_by` but then the comparison will be alphabetically rather than numerically, i.e. `"12"` will come before `"3"`. To have a true numeric comparison, convert it using `tonumber` for the aggregator, that is use `max_by(.archive.stop | tonumber)` – pmf Jan 06 '22 at 14:09
  • thanks again! I changed "| .backup[-1] + {name}" to " | [ .backup[] + {name} ]" after which I could apply the "max_by(.archive.stop)" to get the desired result. The -1 also works but assumes an ordering in the array, where I am not sure about so using the max_by looks a bit safer. –  Jan 06 '22 at 14:50
  • You introduced the `.backup[-1]` yourself in your [question](https://stackoverflow.com/questions/70593955/how-to-denormalise-this-json-structure/70594060?noredirect=1#comment124815911_70594060) and you used it together with `max_by`. That's why I assumed you want to aggregate over all last items within the outermost array. But as you've got the techniques now, I'm sure you'll find a way to get what you actually want. :) – pmf Jan 06 '22 at 14:57
0

First navigate to backup and only then “print” the stuff you’re interested.

.[].backup[] | .name + ":" + .type
Arie
  • 41
  • 2