33

The following is the first two items in my json file

{
"ReferringUrl": "N",
"OpenAccess": "0",
"Properties": {
    "ItmId": "1694738780"
   }
}
{
"ReferringUrl": "L",
"OpenAccess": "1",
"Properties": {
    "ItmId": "1347809133"
  }
}

I want to count the number of items by each ItmId appeared in the json. For example, items that with "ItmId" 1694738780 appears 10 times and items with "ItmId" 1347809133 appears 14 times in my json file. Then return a json like this

{"ItemId": "1694738780",
 "Count":  10
}
{"ItemId": "1347809133",
 "Count":  14
}

I am using bash. And prefer do this totally by jq. But it's ok to use other method.

Thank you!!!

Eleanor
  • 2,647
  • 5
  • 18
  • 30

5 Answers5

27

Here's one solution (assuming the input is a stream of valid JSON objects) and that you invoke jq with the -s option:

map({ItemId: .Properties.ItmId})             # extract the ItmID values
| group_by(.ItemId)                          # group by "ItemId"
| map({ItemId: .[0].ItemId, Count: length})  # store the counts
| .[]                                        # convert to a stream

A slightly more memory-efficient approach would be to use inputs if your jq has it; but in that case, use -n instead of -s, and replace the first line above by: [inputs | {ItemId: .Properties.ItmId} ]

Efficient solution

The above solutions use the built-in group_by, which is convenient but leads to easily-avoided inefficiencies. Using the following counter makes it easy to write a very efficient solution:

def counter(stream):
  reduce stream as $s ({}; .[$s|tostring] += 1);

Using the -n command-line option, and applied as follows:

counter(inputs | .Properties.ItmId)

this leads to a dictionary of counts:

{
  "1694738780": 1,
  "1347809133": 1
}

Such a dictionary is probably more useful than a stream of singleton objects as envisioned by the OP, but if such as stream is needed, one can modify the above as follows:

counter(inputs | .Properties.ItmId)
| to_entries[]
| {ItemId: (.key), Count: .value}
peak
  • 105,803
  • 17
  • 152
  • 177
  • It's probably best to put the program into a file, say program.jq, and invoke jq like so: jq -s -f program.jq INPUT.json – peak Jul 18 '17 at 15:46
  • jq '[.] | map({Country: .pubcountry}) | .[]' 1.ndjson| jq -s . | jq 'group_by(.Country) | map({Country: .[].Country, Count: length}) | unique' > 2.json This is how I do after I revise your codes and it works! – Eleanor Jul 18 '17 at 16:05
  • Eleanor - There is no need to invoke jq twice. Also '. | EXP' is equivalent to just 'EXP'. Your approach (using unique) is very inefficient, both in time and memory requirements. Why not just grab the id from .[0] ? – peak Jul 18 '17 at 18:47
  • I am very new to jq and bash, so i actually doesn't totally understand what you said before. I just try the code you post there and it's not work. Then I add some format transformation to make it work. :) – Eleanor Jul 18 '17 at 21:01
  • If you aren't already doing so, I would suggest putting your jq program in a file (say program.jq) and invoking jq with the `-f program.jq` option. There are other tricks that bash makes possible as well, but they're all a bit ... shall we say ... tricky :-) – peak Jul 18 '17 at 21:11
  • Ok I ll try it. Thank you! :) – Eleanor Jul 18 '17 at 21:14
  • Excellent, just what I needed. Could you explain the more efficient method and give the command to do so? Is there a way to output all the result in {} instead of []? – Olivier LAHAYE Nov 22 '19 at 17:57
16

Using jq command

cat json.txt | jq '.Properties .ItmId' | sort | uniq -c | awk -F " " '{print "{\"ItmId\":" $2 ",\"count\":" $1"}"}'| jq .
skr
  • 2,146
  • 19
  • 22
1

Here's a super-efficient solution -- in particular, no sorting is required. The following implementation requires a version of jq with inputs but it is easy to adapt the program to use earlier versions of jq. Please remember to use the -n command-line option if using the following:

# Count the occurrences of distinct values of (stream|tostring).
# To avoid unwanted collisions, or to recover the exact values,
# consider using tojson
def counter(stream):
  reduce stream as $s ({}; .[$s|tostring] += 1);

counter(inputs | .Properties.ItmId)
| to_entries[]
| {ItemId: (.key), Count: .value}
peak
  • 105,803
  • 17
  • 152
  • 177
1

Here is a variation using reduce, setpath and getpath to do the aggregation and to_entries to do the final formatting which assumes you run jq as

jq --slurp -f query.jq < data.json

where data.json contains your data and query.jq contains

  map(.Properties.ItmId)
| reduce .[] as $i (
    {}; setpath([$i]; getpath([$i]) + 1)
  )
| to_entries | .[] | { "ItemId": .key, "Count": .value }
jq170727
  • 13,159
  • 3
  • 46
  • 56
0

If text output suffices, you can do this simply with

$ jq -r <test.json '.Properties.ItmId' |sort |uniq -c
   2 1347809133
   1 1694738780

If you really need JSON lines, you could do this with

$ jq -r <test.json '.Properties.ItmId' |sort |uniq -c |awk '{printf "{ItemId: %s, Count: %s}\n",$2,$1 }' 
{ItemId: 1347809133, Count: 2}
{ItemId: 1694738780, Count: 1}
Scott Centoni
  • 1,019
  • 11
  • 13