0

EDIT: Going to try and simplify my question, and the JSON examples to just relevant elements.

Building a playbook in Ansible, and one task I am trying to do involves pulling down data from 4 separate Qradar API endpoints, and trying to combine some details from each of the endpoints.

4 different json sources for each of the endpoints:

  • "regex_properties.json" : Has unique "identifier", and I need to reference the "name" and "property_type" values.
  • "log_source_types.json" : Has unique "id" field, and I need to reference it's "name"
  • "log_sources.json" : Has unique "id" field, and may include a "type_id" field if it's part of a log_source_type grouping (matches "id" above). Would need the "name" field from this, as well as potentially the 'last_event_time' for filtering (but can get by without it).
  • "property_expressions.json" : Has unique "identifier" field. Also has which "log_source_type_id" and/or "log_source_id" each "regex_property_identifier" is mapped to. These values map to the unique identifiers in the other logs

Examples from Lab:

regex_properties.json
[
  {
    "identifier": "59723052-d96c-4cef-ba7b-69d426602e04",
    "property_type": "numeric",
    "name": "indexTotalSize",
  }
]

log_sources.json
[
  {
    "id": 64,
    "name": "SIM Audit-2 :: eng-qradar-aio-01",
    "type_id": 105,
    "last_event_time": 1588628234930,
  }
]

log_source_types.json
[
    "name": "SIM Audit",
    "id": 105
  },
]

property_expressions.json
[
  {
    "identifier": "0311c65b-d5b5-483e-943f-b539543a8e95",
    "log_source_type_id": 105,
    "log_source_id": 65,
    "regex_property_identifier": "59723052-d96c-4cef-ba7b-69d426602e04",
  }
]

I would like to pull in these 4 sources, and output a file that has the following data linkeed by the property_expressions.json:

  • The "name" & "property_type" of the regex_property.json (renamed to regex_name or something similiar)
  • The "name" from log_sources.json and log_source_types.json (renamed to ls_name & lst_name, respectively)

Such as below

merged_example.json
[
  {
    "identifier": "0311c65b-d5b5-483e-943f-b539543a8e95",
    "log_source_type_id": 105,
    "log_source_id": 65,
    "regex_property_identifier": "59723052-d96c-4cef-ba7b-69d426602e04",
    "property_type": "numeric",
    "regex_name": "indexTotalSize",
    "lst_name": "SIM Audit",
    "ls_name": "SIM Audit-2 :: eng-qradar-aio-01",
  }
]

Or into a csv with the same data, which the end goal of the export, but can wait.

I tried to rename "identifier" to "regex_property_identifier" in regex_properties.json, then using 'jq -s regex_properties.json property_expressions.json' but I'm still just seeing both contents being separate arrays in the same output/file.

I've tried using ansible and doing something like:

  - name: use JQ to reformat json to csv
    shell: cat /tmp/property_expressions.json | jq -r '.[]' | jq 'select(.enabled == true)' | jq '[.identifier,.regex_property_identifier,.log_source_id,.log_source_type_id] | @csv' > /tmp/props.csv

  - name: Read CSV into dictionary
    read_csv:
      path: "/tmp/props.csv"
      fieldnames: "prop_id,regex_id,ls_id,lst_id"
      delimiter: ","
    register: props

  - name: Loop Prop Dictionary and replace in CSV the regex_id
    replace:
      path: "/tmp/props.csv"
      regexp: "{{ item.regex_id }}"
      replace: "{{ regex_properties.json | json_query(regex_name_q) }},{{ regex_properties.json | json_query(regex_type_q) }}"
    loop: "{{ props.list }}"
    vars:
      regex_name_q: "{{ item.regex_id }}.name"
      regex_type_q: "{{ item.regex_id }}.property_type"

In order to just make a CSV and find/replace the terms item by item. But if I can do it within the JSON arrays that'd be cleaner.

Stranjer
  • 11
  • 2

1 Answers1

0

Assuming away the minor errors in the JSON examples in the question, the following bash script produces the output as shown:

#!/bin/bash

jq -n \
 --argfile lst log_source_types.json \
 --argfile ls  log_sources.json \
 --argfile pe  property_expressions.json \
 --argfile rp  regex_properties.json '
  [ range(0, $pe|length) as $i
    | {identifier: $pe[$i].identifier,
       log_source_type_id: $lst[$i].id,
       log_source_id: $pe[$i].log_source_id,
       regex_property_identifier: $pe[$i].regex_property_identifier,
       property_type: $rp[$i].property_type,
       regex_name: $rp[$i].name,
       lst_name: $lst[$i].name,
       ls_name: $ls[$i].name
     }
  ]
'

Note: I wouldn't be too concerned that --argfile is officially deprecated, but if that bothers you, there are many workarounds, though some are version-dependent. If you want a non-deprecated solution that will work with every version of jq, I'd use the form:

jq -s -f program.jq \ 
 log_source_types.json \
 log_sources.json \
 property_expressions.json \
 regex_properties.json

where program.jq begins by defining the four $-variables, beginning with: .[0] as $lst |

peak
  • 105,803
  • 17
  • 152
  • 177
  • This works pretty great. There are a few cases I'm seeing some empty values, but it works for majority of cases and likely the cases I see some empty values are just edge-cases. – Stranjer May 07 '20 at 19:19