2

I am processing a very large JSON wherein I need to filter the inner JSON objects using a value of a key. My JSON looks like as follows:

{"userActivities":{"L3ATRosRdbDgSmX75Z":{"deviceId":"60ee32c2fae8dcf0","dow":"Friday","localDate":"2018-01-20"},"L3ATSFGrpAYRkIIKqrh":{"deviceId":"60ee32c2fae8dcf0","dow":"Friday","localDate":"2018-01-21"},"L3AVHvmReBBPNGluvHl":{"deviceId":"60ee32c2fae8dcf0","dow":"Friday","localDate":"2018-01-22"},"L3AVIcqaDpZxLf6ispK":{"deviceId":"60ee32c2fae8dcf0","dow":"Friday,"localDate":"2018-01-19"}}}

I want to put a filter on localDate values such that localDate in 2018-01-20 or localDate in "2018-01-21" such that the output look like.

{"userActivities":{"L3ATRosRdbDgSmX75Z":{"deviceId":"60ee32c2fae8dcf0","dow":"Friday","localDate":"2018-01-20"},"L3ATSFGrpAYRkIIKqrh":{"deviceId":"60ee32c2fae8dcf0","dow":"Friday","localDate":"2018-01-21"}}}

I have asked a similar question here and realised that I need to put filter on multiple values and retain the original structure of JSON.

https://stackoverflow.com/questions/52324497/how-to-filter-json-using-jq-stream

Thanks a ton in advance!

peak
  • 105,803
  • 17
  • 152
  • 177
Sains
  • 457
  • 1
  • 7
  • 19
  • Please explain exactly how the large JSON file relates to the sample JSON. E.g. does the former consist of just one JSON object with just one key? – peak Sep 14 '18 at 09:31
  • JSON file is around 20 GB and it contains userActivities key and under that millions of random generated keys like L3ATRosRdbDgSmX75Z as shown in the sample json. Under these keys exists localDate field and I need to extract the output json( as shown in the question) with localDate falling in Aug 2018 – Sains Sep 14 '18 at 10:07

1 Answers1

1

From the jq Cookbook, let's borrow def atomize(s):

# Convert an object (presented in streaming form as the stream s) into
# a stream of single-key objects
# Examples:
#   atomize({a:1,b:2}|tostream)
#   atomize(inputs) (used in conjunction with "jq -n --stream")
def atomize(s):
  fromstream(foreach s as $in ( {previous:null, emit: null};
      if ($in | length == 2) and ($in|.[0][0]) != .previous and .previous != null
      then {emit: [[.previous]], previous: $in|.[0][0]}
      else { previous: ($in|.[0][0]), emit: null}
      end;
      (.emit // empty), $in) ) ;

Since the top-level object described by the OP contains just one key, we can select the August 2018 objects as follows:

atomize(1|truncate_stream(inputs))
| select( .[].localDate[0:7] == "2018-08")

If you want these collected into a composite object, you might have to be careful about memory, so you might want to pipe the selected objects to another program (e.g. awk or jq). Otherwise, I'd go with:

def add(s): reduce s as $x (null; .+$x);

{"userActivities": add(
    atomize(1|truncate_stream(inputs | select(.[0][0] == "userActivities")))
    | select( .[].localDate[0:7] =="2018-01") ) }

Variation

If the top-level object has more than one key, then the following variation would be appropriate:

atomize(1|truncate_stream(inputs | select(.[0][0] == "userActivities")))
| select( .[].localDate[0:7] =="2018-08")
peak
  • 105,803
  • 17
  • 152
  • 177
  • My output contains separate objects of the keys. However, as mentioned in the sample output, I want to retain the JSON structure but I'm getting each key as a separate JSON object. – Sains Sep 16 '18 at 23:47
  • You should be able to figure out a way to do that, as there are many, many possibilities. – peak Sep 17 '18 at 01:08
  • Cool! I'll explore the options to merge these JSON objects into a single JSON file.Thanks for the help! – Sains Sep 17 '18 at 02:07