-1

I'm using jq to find patterns in a very large JSON file (500MB+) with the following flat object structure:

{
   "prop1": "large string",
   "prop2": "another large string",
   "prop3": "yet another large string",
   ...
}

The below query works fine and it takes less than 15sec to return results:

jq 'map(select(contains("PATTERN")==true))' largefile.json > res.json

but that returns me an array of the strings in which the pattern is found, so I lose the property names. When I try to use map_values, so I can also get the property names, as in:

jq 'map_values(select(contains("PATTERN")==true))' largefile.json > res.json

the query takes forever.

Is there an equivalent query that is fast like map, and which can also provide me with the key:value pairs?

peak
  • 105,803
  • 17
  • 152
  • 177
greywolf
  • 37
  • 5

2 Answers2

1

Since your JSON file is not too big for jq to read, a simple and efficient solution (modulo the use of jq to read the file into memory) would be to use keys_unsorted/0 and test/1:

keys_unsorted[] as $k
| select(.[$k] | test("another"))
| [$k, .[$k]]

(Using map_values would be unnecessarily inefficient, and using contains is probably not a good idea unless you fully understand its complications.)

If you require the output to be a single object, you could either adapt the above, or (at the cost of the memory required for the output object):

. as $in
| reduce keys_unsorted[] as $k ({};
    if ($in[$k] | test("another"))
    then  .[$k] = $in[$k]
    else . end)

Very Large Files

For files that are too big to read into jq normally, you could use jq's streaming parser, i.e. using the --stream command-line option. Unfortunately, this is easier said than done, but an easy approach would be to use atomize as defined e.g. at jq --stream filter on multiple values of same key

peak
  • 105,803
  • 17
  • 152
  • 177
  • the keys_unsorted example you provided here does not yield a proper JSON object as a result. the curly brackets are missing at the top and bottom of the file. – greywolf May 06 '19 at 14:31
  • Your question asked for the `key:value` pairs. Maybe you should clarify your requirements, e.g. in accordance with [mcve]. In the meantime, maybe you can figure out how to tweak the jq program to achieve what you want, e.g. starting with `{($k): .[$k]}` – peak May 06 '19 at 14:50
-1

Just use with_entries/1 it allows you to effectively filter out properties of an object based on the key and/or value.

with_entries(select(.value | contains("PATTERN")))
Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272