0

Consider the following JSON having a list of key-value pairs

{
  "session1": 128,
  "session2": 1048596,
  "session3": 3145728,
  "session4": 3145828,
  "session5": 11534338,
  "session6": 11544336,
  "session7": 2097252
}

The key is a session identifier, and the value is the length of the value stored in the session.

I want to print counts of values by range - the ranges being (lower bound included, high bound excluded); 0-1MB, 1-2MB, 2-3MB, ... 12-13MB.

 1MB =  1048576
 2MB =  2097152
 3MB =  3145728
 4MB =  4194304
 5MB =  5242880
 6MB =  6291456
 7MB =  7340032
 8MB =  8388608
 9MB =  9437184
10MB = 10485760
11MB = 11534336
12MB = 12582912
13MB = 13631488

The expected output is

{
  "0-1MB": 1,
  "1-2MB": 1,
  "2-3MB": 1,
  "3-4MB": 2,
  "10-11MB": 2
}

The above is just representative, suggestions are welcome.

peak
  • 105,803
  • 17
  • 152
  • 177

2 Answers2

1

The following should work:

to_entries
| map(.value / 1048576 | floor | [tostring, "-", (.+1 | tostring), "MB"] | add)
| group_by(.)
| map({"key": .[0], "value": length})
| from_entries

For your input, it produces the following output:

{
  "0-1MB": 1,
  "1-2MB": 1,
  "11-12MB": 2,
  "2-3MB": 1,
  "3-4MB": 2
}

(11534338 and 11544336 are counted in the "11-12MB" bucket rather than the "10-11MB" one, because 11*2^20 = 11534336, and those numbers are larger than that.)

If you wanted the keys in numeric order, you could also convert them to your preferred string labels after the group_by:

to_entries
| map(.value / 1048576 | floor)
| group_by(.)
| map({"key": [(.[0] | tostring), "-", (.[0]+1 | tostring), "MB"] | add, "value": length})
| from_entries

Which produces:

{
  "0-1MB": 1,
  "1-2MB": 1,
  "2-3MB": 1,
  "3-4MB": 2,
  "11-12MB": 2
}

Both solutions have the same basic steps:

  1. Convert the input object to an array of {"key": x, "value": y} entries (to_entries).
  2. Map the entries into something that identifies the range they're in, by rouding down to the nearest megabyte (.value / 1048576 | floor).
  3. Group by the value (group_by). This produces an array like [[0], [1], [2], [3, 3], [11, 11]] for your input.
  4. For each group, produce an entry where the "key" field is the range label ("X-YMB") and the "value" is the number of elements in the group (length).
  5. Convert the list of entries back to a single object (from_entries).
fizzie
  • 651
  • 2
  • 5
  • Sometimes one just runs out of words and this is one of those moments. Thank you so much for the elaborate answer and going beyond, almost reading my mind, and providing the bit about sorting the keys in numeric order. Much appreciated. – Jojo Thomas Jul 30 '23 at 11:36
1

Here's an approach using reduce which simply iterates over the input values integer-divided by 1MB, and successively increments the according result field by one.

reduce (.[] / 1048576 | floor) as $k ({}; ."\($k)-\($k+1)MB" += 1)
{
  "0-1MB": 1,
  "1-2MB": 1,
  "3-4MB": 2,
  "11-12MB": 2,
  "2-3MB": 1
}

Demo

The stream of numbers iterated over can, of course, be sorted first to get an object with increasing field names:

reduce (map(.) | sort[] / 1048576 | floor) as $k ({}; ."\($k)-\($k+1)MB" += 1)
{
  "0-1MB": 1,
  "1-2MB": 1,
  "2-3MB": 1,
  "3-4MB": 2,
  "11-12MB": 2
}

Demo

pmf
  • 24,478
  • 2
  • 22
  • 31
  • Thank you @pmf. You have also provided a link to Demo which is super-helpful. I have also posted a separate question asking if the sum can be printed - i.e., check which range is having the most size. – Jojo Thomas Jul 30 '23 at 13:25
  • Forgot I can only post one every 90 minutes. So, my other question was not posted yet. [It is live now](https://stackoverflow.com/questions/76798065/for-a-simple-key-value-pair-list-json-use-jq-to-print-the-sum-of-values-by-rang) – Jojo Thomas Jul 30 '23 at 13:55