How to merge a remote json referenced by a url string into the current json

Question

Given

[
  {"json1": "http://example.com/remote1.json"},
  {"json2": "http://example.com/remote2.json"}
]

with remote1.json and remote2.json containing [1] and [2] respectively

How to turn it into

[{"json1": [1], "json2": [2]}]

using jq? I think other CLI tools like bash and curl are needed. But I have no idea how to merge the responses back.

Having `jq` follow JSON references (`[ { "json1" : { "$ref": "http://example.com/remote1.json" }, { "json2": { "$ref": "http://exapmle.com/remote2.json"}]`) would be interesting. — chepner, Mar 10 '17 at 13:29
@chepner Sorry, are you saying jq is able to download url references automatically or you wish it had this feature? — hgl, Mar 10 '17 at 13:33
To be clear, it does not do it now. I'm kind of +0 on actually adding it; network access seems outside the scope of what `jq` does. — chepner, Mar 10 '17 at 14:15
But I think this is a pretty common usage. I've simplified the question a bit. The real usage is that the json contains objects like `{id: 12, name: "John"}`, and there is another json referenced by the url that can be constructed from `12` that contains detail information about John, and I want to merge it back into this summary object. I believe this type of pattern that a list containing summary objects and each detail information being at another url is very common. — hgl, Mar 10 '17 at 14:42
BTW, the literal curly braces around the whole thing are actually a complicating factor. Will there only ever be one surrounding list, or can we have a stream with more? — Charles Duffy, Mar 10 '17 at 15:43
Once upon a time, the `jq` language was simple enough that a script written in it could be guaranteed to complete in constant time (well, `O(n)` with length of input), and was sufficiently bounded that one could run arbitrary jq scripts defined by untrusted 3rd parties. The facilities that make this no longer the case are useful ones, but it isn't worth pretending that nothing has been lost. — Charles Duffy, Mar 10 '17 at 16:11

score 2 · Answer 1 · answered Mar 10 '17 at 15:37

2

XPath/XQuery has network access functions, since the W3C loves URI-references. If you are open to other tools, you could try my XPath/XQuery/JSONiq interpreter:

xidel master.json -e '[$json()()!{.:json($json()(.))}]'

Syntax:

$json is the input data
json() is a function to retrieve JSON
() are array values or object keys
! maps a sequence of values, whereby . is a single value

answered Mar 10 '17 at 15:37

BeniBela

16,412
4
45
52

Xidel: my favo tool to write extractors. And usually in the most effective/shortest way possible... a one-liner. Brilliant. Now BeniBela should work on the marketing... because very few people know about Xidel. On my bucketlist: * create a Xidel fansite. Now I only need time... – MatrixView Mar 11 '17 at 18:50

score 0 · Answer 2 · edited May 23 '17 at 12:09

0

Network access has been proposed for jq but rejected, because of some combination of security, complexity, portabilty, and bloatware concerns.
Shelling out has likewise been proposed but still seems some way off.
It would be quite easy to achieve what I understand to be the goal here, using jq and curl in conjuction with a scripting language such as bash. One way would be to serialize the JSON, and then "edit" the serialized JSON using curl, before deserializing it. For serialization/deserialization functions in jq, see e.g. How to Flatten JSON using jq and Bash into Bash Associative Array where Key=Selector?
If all strings that are valid URLs are to be replaced, then identifying them could in principle be done before or after serialization. If only a subset of such strings are to be dereferenced, then the choice might depend on the specific requirements.

edited May 23 '17 at 12:09

Community

1
1

answered Mar 10 '17 at 14:38

peak

105,803
17
152
177

Thanks. I guess editing serialized JSON could work, but it sounds primal, and if i could edit it I could also just edit the original JSON (maybe also turn the url into some special string before hand so it's easier to replace). I feel calling out to shell inside jq might solve this problem more elegantly, but it seems it's not implemented yet. In the github issue list I saw people suggesting `input`/`inputs` to mitigate, but i have no idea on how to use them (the doc isn't very clear on them). Do you happen to be familiar with them? Do you think they can help here? – hgl Mar 10 '17 at 15:07
Input/inputs are straightforward but completely irrelevant here. – peak Mar 10 '17 at 15:32

Charles Duffy · Accepted Answer · 2017-03-11T17:55:56.943

First, our test framework:

curl() {
  case $1 in
    http://example.com/remote1.json) echo "[1]" ;;
    http://example.com/remote2.json) echo "[2]" ;;
    *) echo "IMABUG" ;;
  esac
}
input_json='[
  {"json1": "http://example.com/remote1.json"},
  {"json2": "http://example.com/remote2.json"}
]'

Then, our actual code:

# defines the "walk" function, which is not yet included in a released version of jq
# ...in the future, this will not be necessary.
walk_fn='
def walk(f):
  . as $in
  | if type == "object" then
      reduce keys[] as $key
        ( {}; . + { ($key):  ($in[$key] | walk(f)) } ) | f
  elif type == "array" then map( walk(f) ) | f
  else f
  end;
'

get_url_keys() {
  jq -r "$walk_fn
    walk(
      if type == \"object\" then
        to_entries
      else . end
    )
    | flatten
    | .[]
    | select(.value | test(\"://\"))
    | [.key, .value]
    | @tsv"
}

operations=( )
options=( )
i=0
while IFS=$'\t' read -r key url; do
  options+=( --arg "key$i" "$key" --argjson "value$i" "$(curl "$url")" )
  operations+=(
    " walk(
        if type == \"object\" then
          if .[\$key$i] then .[\$key$i]=\$value$i else . end
        else . end
      ) "
  )
  (( ++i ))
done < <(get_url_keys <<<"$input_json")

IFS='|' # separate operations with a | character
jq -c "${options[@]}" "${walk_fn} ${operations[*]}" <<<"$input_json"

Output is properly:

[{"json1":[1]},{"json2":[2]}]

How to merge a remote json referenced by a url string into the current json

3 Answers3