Remove matching/non-matching elements of a nested array using jq

Question

I need to split the results of a sonarqube analysis history into individual files. Assuming a starting input below,

    {
  "paging": {
    "pageIndex": 1,
    "pageSize": 100,
    "total": 3
  },
  "measures": [
    {
      "metric": "coverage",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "100.0"
        },
        {
          "date": "2018-11-21T12:22:39+0000",
          "value": "100.0"
        },
        {
          "date": "2018-11-21T13:09:02+0000",
          "value": "100.0"
        }
      ]
    },
    {
      "metric": "bugs",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T12:22:39+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T13:09:02+0000",
          "value": "0"
        }
      ]
    },
    {
      "metric": "vulnerabilities",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T12:22:39+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T13:09:02+0000",
          "value": "0"
        }
      ]
    }
  ]
}

How do I use jq to clean the results so it only retains the history array entries for each element? The desired output is something like this (output-20181118123808.json for analysis done on "2018-11-18T12:37:08+0000"):

{
  "paging": {
    "pageIndex": 1,
    "pageSize": 100,
    "total": 3
  },
  "measures": [
    {
      "metric": "coverage",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "100.0"
        }
      ]
    },
    {
      "metric": "bugs",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        }
      ]
    },
    {
      "metric": "vulnerabilities",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        }
      ]
    }
  ]
}

I am lost on how to operate only on the sub-elements while leaving the parent structure intact. The naming of the JSON file is going to be handled externally from the jq utility. The sample data provided will be split into 3 files. Some other input can have a variable number of entries, some may be up to 10000. Thanks.

Part of the Q is garbled ("how can I remo"). Also, given the input shown, how many files are you expecting? One for each date? How should the files be named? — peak, Nov 27 '18 at 04:10
naming of the output files is originally going to be handled separately but if it can be done from within jq then that will be a big plus! :D — ramfree17, Nov 27 '18 at 05:23

peak · Accepted Answer · 2018-11-27T08:36:08.087

Here is a solution which uses awk to write the distinct files. The solution assumes that the dates for each measure are the same and in the same order, but imposes no limit on the number of distinct dates, or the number of distinct measures.

jq -c 'range(0; .measures[0].history|length) as $i
  | (.measures[0].history[$i].date|gsub("[^0-9]";"")),  # basis of filename
    reduce range(0; .measures|length) as $j (.;
      .measures[$j].history |= [.[$i]])' input.json |
awk -F\\t 'fn {print >> fn; fn="";next}{fn="output-" $1 ".json"}'

Comments

The choice of awk here is just for convenience.

The disadvantage of this approach is that if each file is to be neatly formatted, an additional run of a pretty-printer (such as jq) would be required for each file. Thus, if the output in each file is required to be neat, a case could be made for running jq once for each date, thus obviating the need for the post-processing (awk) step.

If the dates of the measures are not in lock-step, then the same approach as above could still be used, but of course the gathering of the dates and the corresponding measures would have to be done differently.

Output

The first two lines produced by the invocation of jq above are as follows:

"201811181237080000"
{"paging":{"pageIndex":1,"pageSize":100,"total":3},"measures":[{"metric":"coverage","history":[{"date":"2018-11-18T12:37:08+0000","value":"100.0"}]},{"metric":"bugs","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]},{"metric":"vulnerabilities","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]}]}

is there a variation wherein the filtering is based on the date value and not the position? It is not guaranteed that the order will be the same or the number of elements in each metric is going to be the same (i.e. some dates may be missing "bugs", some might have additional metric such as "complexity"). — ramfree17, Nov 30 '18 at 08:37
Yes, and the same approach can be used, but it would take some work. Since SO is not a free programming service, it might be time to learn jq :-) — peak, Nov 30 '18 at 09:18
I am trying to learn it and I apologize if I come across as asking for a full program. What I am after is a code snippet that filters (or removes) the non-matching entries of an embedded array. I will try to figure it out based on what you have provided. For now I have gone with the more costly of querying SonarQube for each date which does the filtering. Thanks. — ramfree17, Dec 02 '18 at 15:12

peak · Answer 2 · 2018-12-03T01:58:47.793

0

In the comments, the following addendum to the original question appeared:

is there a variation wherein the filtering is based on the date value and not the position? It is not guaranteed that the order will be the same or the number of elements in each metric is going to be the same (i.e. some dates may be missing "bugs", some might have additional metric such as "complexity").

The following will produce a stream of JSON objects, one per date. This stream can be annotated with the date as per my previous answer, which shows how to use these annotations to create the various files. For ease of understanding, we use two helper functions:

def dates:
  INDEX(.measures[].history[].date; .)
  | keys;

def gather($date): map(select(.date==$date));

dates[] as $date
| .measures |= map( .history |= gather($date) )

INDEX/2

If your jq does not have INDEX/2, now would be an excellent time to upgrade, but in case that's not feasible, here is its def:

def INDEX(stream; idx_expr):
  reduce stream as $row ({};
    .[$row|idx_expr|
      if type != "string" then tojson
      else .
      end] |= $row);

edited Dec 03 '18 at 01:58

answered Dec 02 '18 at 18:11

peak

105,803
17
152
177

thanks peak. are these meant to be placed in a file and read by jq? I didnt know that jq allows interpreting instructions from a file so this is a new way to use jq for me. – ramfree17 Dec 03 '18 at 00:07
Yep, it's time to take a closer look at the jq manual :-) https://stedolan.github.io/jq/manual/v1.6/ – peak Dec 03 '18 at 00:21
That explains why I cant find the option. The version in Ubuntu 18.04 is still 1.5 and I need to maintain compatibility with the other people who will be using the scripts I will create. Thanks for the heads up! – ramfree17 Dec 03 '18 at 01:37
I'm not sure what "option" you're referring to, but I've added the def of INDEX/2 in the response. – peak Dec 03 '18 at 02:00
I was referring to the option to read a file. That is not available in jq 1.5. Upgrading is fine for my own machine but I need to maintain compatibility with the other machines in the system. Thanks for all the items to read and understand about. :) – ramfree17 Dec 03 '18 at 10:00
The -f option has been available (and documented) for years, so it would seem you are mistaken. – peak Dec 03 '18 at 10:05
yup, I am. I was looking at the output of jq --help when I should have RTFM. :D – ramfree17 Dec 03 '18 at 11:50

Remove matching/non-matching elements of a nested array using jq

2 Answers2

Comments

Output

INDEX/2