0

I have the following JSON file that I would like to parse with jq tool that someone suggested me but I'm new with it. There are 3 parents nodes with the same children names. The parent nodes are MNRs, GNRs and MSNRs and each of them has children named N1, N2, NR_i, NR_f.

{
  "Main": {
    "Document": "Doc.1",
    "Cini": "DDFR",
    "List": {
      "SubList": {
        "CdTa": "ABC",
        "NN": "XYZ",
        "ND": {
          "RiS": {
            "RiN": {
              "NSE14": {
                "MNRs": {
                  "MRD": [
                    {
                      "NR": {
                        "N1": "393",
                        "N2": "720",
                        "SNR": {
                          "NR_i": "203",
                          "NR_f": "49994"
                        }
                      }
                    },
                    {
                      "NR": {
                        "N1": "687",
                        "N2": "345",
                        "SNR": {
                          "NR_i": "55005",
                          "NR_f": "1229996"
                        }
                      }
                    }
                  ]
                },
                "GNRs": {
                  "RD": {
                    "NR": {
                      "N1": "649",
                      "N2": "111",
                      "SNR": {
                        "NR_i": "55400",
                        "NR_f": "877"
                      }
                    }
                  }
                },
                "MSNRs": {
                  "NR": [
                    {
                      "N1": "748",
                      "N2": "5624",
                      "SNR": {
                        "NR_i": "8746",
                        "NR_f": "7773"
                      }
                    },
                    {
                      "N1": "124",
                      "N2": "54",
                      "SNR": {
                        "NR_i": "8847",
                        "NR_f": "5526"
                      }
                    }
                  ]
                }
              },
              "NSE12": {
                "MBB": "990",
                "MRB": "123"
              },
              "MGE13": {
                "TBB": "849",
                "TRB": "113"
              }
            }
          }
        }
      }
    }
  }
}

With this code I get the following

.Main.List.SubList.ND.RiS.RiN.NSE14.MNRs.MRD

[
  {
    "NR": {
      "N1": "393",
      "N2": "720",
      "SNR": {
        "NR_i": "203",
        "NR_f": "49994"
      }
    }
  },
  {
    "NR": {
      "N1": "687",
      "N2": "345",
      "SNR": {
        "NR_i": "55005",
        "NR_f": "1229996"
      }
    }
  }
]

And with these commands I get the a columns of individual values for each children and others null.

.. | .N1?
.. | .N2?
.. | .NR_i?
.. | .NR_f?

I'm far from my desired output since I'd like to extract the children for each parent and tabulate in the form below.

+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
|             MNRs              |          GNRs            |           MSNRs          |
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
| N1   | N2   | NR_i  | NR_f    | N1  | N2  | NR_i  | NR_f | N1  | N2   | NR_i | NR_f |
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
| 393  | 720  | 203   | 49994   | 649 | 111 | 55400 | 877  | 748 | 5624 | 8746 | 7773 |
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
| 687  | 345  | 55005 | 1229996 |     |     |       |      | 124 | 54   | 8847 | 5526 |
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+ 

May someone help me with this. Thanks in advance.

peak
  • 105,803
  • 17
  • 152
  • 177
Ger Cas
  • 2,188
  • 2
  • 18
  • 45

1 Answers1

3

Since the nature of the input JSON has only been given by example, let's begin by defining a filter for linearizing .NR:

# Produce a stream of arrays
def linearize:
  if type == "array" then .[] | linearize
  else [ .N1, .N2, .SNR.NR_i, .SNR.NR_f]
  end;

The relevant data can now be extracted while preserving the top-level groups as follows:

.Main.List.SubList.ND.RiS.RiN.NSE14
| [to_entries[]
| [.key]
  + [.value | .. | objects | select(has("NR")) | .NR | [ linearize ]] ]

Because the input JSON is not uniform, it will help to ensure uniformity by augmenting the above pipeline with the following mapping:

| map(if length > 2 then [.[0], [.[1:][][]]] else . end)

This produces a single JSON array structured like this:

[["MNRs",[["393","720","203","49994"]],[["687","345","55005","1229996"]]],
 ["GNRs", ...

To obtain the first data row of the table from this intermediate result, it will be worthwhile defining a function that will provide the necessary padding:

def row($i; $padding):
  . as $in
  | [range(0;$padding) | null] as $nulls
  | reduce range(0; length) as $ix 
      ([]; . + ($in[$ix][1][$i] // $nulls));

Now the first data row can be obtained by row(0;4), the second by row(1;4), etc.

The total number of data rows would be given by filtering the intermediate data structure through map(.[1] | length) | max; thus, the data rows can be obtained by tacking the following onto the previous pipeline:

| (map(.[1] | length) | max) as $rows
| range(0; $rows) as $r
| row($r; 4)
| @tsv

Using the -r command-line option and the given sample, the output would be:

393 720 203 49994   649 111 55400   877 748 5624    8746    7773
687 345 55005   1229996                 124 54  8847    5526

Adding the headers is left as an exercise :-)

peak
  • 105,803
  • 17
  • 152
  • 177
  • 1
    Bravo. I just learned more about `jq` than the past 10 hours of tuts and `man` – vintnes Jun 06 '19 at 12:40
  • @peak Thanks so much for your great support. I must say I don't undertand too much the advanced techniques you use. May you suggest me some good site with tutoria and examples for jq or the jq site itself? Regarding the code only works for me before the `row` function definition. I've tried here `https://jqplay.org/s/aUTncmciuj` – Ger Cas Jun 06 '19 at 15:54
  • @GerCas - You have to put *all* the defs before *all* the main pipeline. There should not be a "|" immediately following the defs section. – peak Jun 06 '19 at 18:40
  • @Cergas - Regarding learning resources, see the "Resources" section in my response at https://stackoverflow.com/questions/56435383, and the first Comment thereafter. – peak Jun 06 '19 at 18:47
  • @peak Thanks so much for the correction. Now it works perfect. I tried to add the headers thinking that would be pasted in Excel, This is each header in 4 cells merged. I tried this `["MNRs","\t\t\t","GNRs","\t\t\t","MSNRs","\t\t\t"],` but the output is bad. But this is a minor issue. Regarding the leaning resources, I didn't know the Rossetacode site. It's awesome. Thanks – Ger Cas Jun 06 '19 at 20:21