2

As a follow-up to Flatten Arbitrary JSON, I'm looking to take the flattened results and make them suitable for doing queries and updates back to the original JSON file.

Motivation: I'm writing Bash (4.2+) scripts (on CentOS 7) that read JSON into a Bash associative array using the JSON selector/filter as the key. I do processing on the associative arrays, and in the end I want to update the JSON with those changes.

The preceding solution gets me close to this goal. I think there are two things that it doesn't do:

  1. It doesn't quote keys that require quoting. For example, the key com.acme would need to be quoted because it contains a special character.
  2. Array indexes are not represented in a form that can be used to query the original JSON.

Existing Solution

The solution from the above is:

$ jq --stream -n --arg delim '.' 'reduce (inputs|select(length==2)) as $i ({};
[$i[0][]|tostring] as $path_as_strings
    | ($path_as_strings|join($delim)) as $key
    | $i[1] as $value
    | .[$key] = $value
)' input.json

For example, if input.json contains:

{
   "a.b":
   [
      "value"
   ]
}

then the output is:

{
  "a.b.0": "value"
}

What is Really Wanted

An improvement would have been:

{
  "\"a.b\"[0]": "value"
}

But what I really want is output formatted so that it could be sourced directly in a Bash program (implying the array name is passed to jq as an argument):

ArrayName['"a.b"[0]']='value'  # Note 'value' might need escapes for Bash

I'm looking to have the more human-readable syntax above as opposed to the more general:

ArrayName['.["a.b"][0]']='value'

I don't know if jq can handle all of this. My present solution is to take the output from the preceding solution and to post-process it to the form that I want. Here's the work in process:

#!/bin/bash
Flatten()                                                                                                                                                                                                                       
{
local -r OPTIONS=$(getopt -o d:m:f: -l "delimiter:,mapname:,file:" -n "${FUNCNAME[0]}" -- "$@")
eval set -- "$OPTIONS"

local Delimiter='.' MapName=map File=
while true ; do
   case "$1" in
   -d|--delimiter)   Delimiter="$2"; shift 2;;
   -m|--mapname)     MapName="$2"; shift 2;;
   -f|--file)        File="$2"; shift 2;;
   --)               shift; break;;
   esac
done

local -a Array=()
readarray -t Array <<<"$(
   jq -c -S --stream -n --arg delim "$Delimiter" 'reduce (inputs|select(length==2)) as $i ({}; .[[$i[0][]|tostring]|join($delim)] = $i[1])' <<<"$(sed 's|^\s*[#%].*||' "$File")" |
   jq -c "to_entries|map(\"\(.key)=\(.value|tostring)\")|.[]" |
   sed -e 's|^"||' -e 's|"$||' -e 's|=|\t|')"

if [[ ! -v $MapName ]]; then
   local -gA $MapName
fi

. <(
   IFS=$'\t'
   while read -r Key Value; do
      printf "$MapName[\"%s\"]=%q\n" "$Key" "$Value"
   done <<<"$(printf "%s\n" "${Array[@]}")"
)
}
declare -A Map
Flatten -m Map -f "$1"
declare -p Map

With the output:

$ ./Flatten.sh <(echo '{"a.b":["value"]}')
declare -A Map='([a.b.0]="value" )'
Community
  • 1
  • 1
Steve Amerige
  • 1,309
  • 1
  • 12
  • 28
  • Note: In the OP is the idiom `$(sed 's|^\s*[#%].*||' "$File")`. The script handles *Annotated JSON*: JSON that can include comments that begin with a `#` and *annotations* (similar to Java annotations) that begin with a `%`. These need to be stripped out before processing with `jq`. – Steve Amerige Mar 02 '17 at 11:22
  • I think the annotated JSON is going to be a show stopper; you'll need a library that can handle it. (Which is an additional argument for not doing this kind of data processing in shell.) – chepner Mar 02 '17 at 13:24
  • @chepner I wanted to say that, especially since the OP is trying that [since a time](https://stackoverflow.com/questions/42299905/using-jq-flatten-arbitrary-json-to-delimiter-separated-flat-dictionary), but looking at the profile of the OP I found that he seems working on [this](https://eggsh.com/#module-owlcarousel-548-1). But I agree it is a waste of developing and processing time to do such things with the shell. – hek2mgl Mar 02 '17 at 13:34
  • @hek2mgl It turns out that there is a need for this kind of thing which is why I'm spending a fair hunk of time developing what is (poorly) documented right now in https://eggsh.com. I suppose one person's waste is another person's opportunity. In either case, as I continue with this open source development, I have had a lot of good feedback from the Open Source community and from stackoverflow.com. Thanks! – Steve Amerige Mar 02 '17 at 13:58

1 Answers1

1

1) jq is Turing complete, so it's all just a question of which hammer to use.

2)

An improvement would have been:

{ "\"a.b\"[0]": "value" }

That is easily accomplished using a helper function along these lines:

def flattenPath(delim):
  reduce .[] as $s ("";
    if $s|type == "number" 
    then ((if . == "" then "." else . end) + "[\($s)]")
    else . + ($s | tostring | if index(delim) then "\"\(.)\"" else . end)
    end );

3)

I do processing on the associative arrays, and in the end I want to update the JSON with those changes.

This suggests you might have posed an xy-problem. However, if you really do want to serialize and unserialize some JSON text, then the natural way to do so using jq is using leaf_paths, as illustrated by the following serialization/deserialization functions:

# Emit (path, value) pairs
# Usage: jq -c -f serialize.jq input.json > serialized.json
def serialize: leaf_paths as $p | ($p, getpath($p));


# Usage: jq -n -f unserialize.jq serialized.json
def unserialize:
  def pairwise(s):
    foreach s as $i ([]; 
      if length == 1 then . + [$i] else [$i] end;
      select(length == 2));
  reduce pairwise(inputs) as $p (null; setpath($p[0]; $p[1]));

If using bash, you could use readarray (mapfile) to read the paths and values into a single array, or if you want to distinguish between the paths and values more easily, you could (for example) use the approach illustrated by the following:

i=0
while read -r line ; do
  path[$i]="$line"; read -r line; value[$i]="$line"
  i=$((i + 1))
done < serialized.json

But there are many other alternatives.

Community
  • 1
  • 1
peak
  • 105,803
  • 17
  • 152
  • 177
  • Thanks for the assist. The improvement is appreciated. Actually, it isn't an XY problem because my need is to do processing, sometimes extensive processing, in Bash. I just need to fetch the data initially from JSON before I can begin processing. And, a convenient way for me to store the data is using Bash associative arrays because this (1) maintains a 1:1 relationship to the original data; and, (2) allows me to round-trip the data back to JSON. So, I really am looking for the "X" part: how to flatten JSON data into a Bash associative array. Again, thanks for your excellent `jq` expertise! – Steve Amerige Mar 03 '17 at 13:00
  • Do you have something that puts the above together to get me the output that I need (a Bash associative array key as described in my OP)? With your definition in flattenpath.jq, I tried `jq "$(cat flattenpath.jq) "'.|flattenPath(".")' input.json` and got `"[\"value\"]"`. – Steve Amerige Mar 03 '17 at 19:47
  • @SteveAmerige - Please see the updated answer. Please also see the jq documentation regarding how to invoke jq correctly. You will almost surely want to use the -f option (amongst others). – peak Mar 03 '17 at 20:32
  • Because the associative array is going to be used in processing, I need the keys to be intuitive for the person writing code to access them without undue complexity. That's why I want keys to be formed as in the OP (e.g., `."a.b"[0].x."y.z"`. The serialization is only needed for initial reading and final writing. I've tried your suggestions above and I haven't seen that they generate keys the way I'm hoping to generate them. Am I missing something? – Steve Amerige Mar 05 '17 at 13:09
  • I don't really see what is more intuitive about "a.b"[0].x."y.z"." than [["a.b"],[0],["x"],["y.z"]] but it does look as though the answer to your last question might be: Yes. Indeed, I've already written "flattenPath" .... – peak Mar 05 '17 at 16:46
  • I was not able to take the information you provided above and piece it together to see an actual solution. Would you please update your answer to show the actual invocation along with inputs and results to show your full solution? – Steve Amerige Mar 07 '17 at 08:20