How can I filter by a numeric field using jq?

Question

I am writing a script to query the Bitbucket API and delete SNAPSHOT artifacts that have never been downloaded. This script is failing because it gets ALL snapshot artifacts, the select for the number of downloads does not appear to be working.

What is wrong with my select statement to filter objects by the number of downloads?

Of course the more direct solution here would be if I could just query the Bitbucket API with a filter. To the best of my knowledge the API does not support filtering by downloads.

My script is:

#!/usr/bin/env bash
curl -X GET --user "me:mykey" "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads?pagelen=100" > downloads.json

# get all values | reduce the set to just be name and downloads | select entries where downloads is zero | select entries where name contains SNAPSHOT | just get the name
#TODO i screwed up the selection somewhere its returning files that contain SNAPSHOT regardless of number of downloads
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads==0) | select(.name | contains("SNAPSHOT")) | .name' downloads.json > snapshots_without_any_downloads.js
#unique sort, not sure why jq gives me multiple values
sort -u snapshots_without_any_downloads.js | tr -d '"' > unique_snapshots_without_downloads.js

cat unique_snapshots_without_downloads.js | xargs -t -I % curl -Ss -X DELETE --user "me:mykey" "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/%" > deleted_files.txt

A deidentified sample of the raw input from the API is:

{
  "pagelen": 10,
  "size": 40,
  "values": [
    {
      "name": "myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip",
      "links": {
        "self": {
          "href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip"
        }
      },
      "downloads": 2,
      "created_on": "2018-03-15T17:50:00.157310+00:00",
      "user": {
        "username": "me",
        "display_name": "me",
        "type": "user",
        "uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
        "links": {
          "self": {
            "href": "https://api.bitbucket.org/2.0/users/me"
          },
          "html": {
            "href": "https://bitbucket.org/me/"
          },
          "avatar": {
            "href": "https://bitbucket.org/account/me/avatar/32/"
          }
        }
      },
      "type": "download",
      "size": 430894
    },
    {
      "name": "myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip",
      "links": {
        "self": {
          "href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip"
        }
      },
      "downloads": 0,
      "created_on": "2018-03-15T17:50:00.157310+00:00",
      "user": {
        "username": "me",
        "display_name": "me",
        "type": "user",
        "uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
        "links": {
          "self": {
            "href": "https://api.bitbucket.org/2.0/users/me"
          },
          "html": {
            "href": "https://bitbucket.org/me/"
          },
          "avatar": {
            "href": "https://bitbucket.org/account/me/avatar/32/"
          }
        }
      },
      "type": "download",
      "size": 430894
    },
    {
      "name": "myproject_1.0_mc_3.5.1.zip",
      "links": {
        "self": {
          "href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.1.zip"
        }
      },
      "downloads": 5,
      "created_on": "2018-03-15T17:49:14.885544+00:00",
      "user": {
        "username": "me",
        "display_name": "me",
        "type": "user",
        "uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
        "links": {
          "self": {
            "href": "https://api.bitbucket.org/2.0/users/me"
          },
          "html": {
            "href": "https://bitbucket.org/me/"
          },
          "avatar": {
            "href": "https://bitbucket.org/account/me/avatar/32/"
          }
        }
      },
      "type": "download",
      "size": 430934
    }
  ],
  "page": 1,
  "next": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads?pagelen=10&page=2"
}

The output I want from this snippet is myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip - that artifact is a SNAPSHOT and has zero downloads.

I have used this intermediate step to do some debugging:

jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads>0) | select(.name | contains("SNAPSHOT")) | unique' downloads.json > snapshots_with_downloads.js
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads==0) | select(.name | contains("SNAPSHOT")) | .name' downloads.json > snapshots_without_any_downloads.js
#this returns the same values for each list!
diff unique_snapshots_with_downloads.js unique_snapshots_without_downloads.js

This adjustment gives a cleaner and unique structure, it suggests that theres some sort of splitting or streaming aspect of jq that I do not fully understand:

#this returns a "unique" array like I expect, adding select to this still does not produce the desired outcome 
jq '.values | [{name: .[].name, downloads: .[].downloads}] | unique' downloads.json

The data after this step looks like this. It just removed the cruft I didn't need from the raw API response:

[
  {
    "name": "myproject_1.0_2400a51_mc_3.4.0.zip",
    "downloads": 0
  },
  {
    "name": "myproject_1.0_2400a51_mc_3.4.1.zip",
    "downloads": 2
  },
  {
    "name": "myproject_1.1-SNAPSHOT_391f4d5_mc_3.5.0.zip",
    "downloads": 0
  },
  {
    "name": "myproject_1.1-SNAPSHOT_391f4d5_mc_3.5.1.zip",
    "downloads": 2
  }
]

A [mcve] would include some sample JSON -- ideally, something as simple as possible to exemplify what you're trying to do, and then the shortest possible `jq` code that tries to do that thing. — Charles Duffy, Mar 16 '18 at 14:37
Thank you for adding data. That said, running `jq '.values | {name: .[].name, downloads: .[].downloads}' — Charles Duffy, Mar 16 '18 at 14:52
BTW, `== "0"` expects a string, whereas it would be `== 0` for an integer. Not sure why the former is present anywhere in the code. — Charles Duffy, Mar 16 '18 at 14:54
The "0" vs 0 is more debugging cruft. Thanks for holding my feet to the fire here, some sloppy debugging steps got copied into the question. — Freiheit, Mar 16 '18 at 14:56
`jq -rn '[inputs | .values | {name: .[].name, downloads: .[].downloads} | select(.downloads==0) | select(.name | contains("SNAPSHOT")) | .name] | unique | .[]'` — Charles Duffy, Mar 16 '18 at 15:00
Note the `jq -r` if you don't want content emitted in JSON format (with leading and trailing quotes). — Charles Duffy, Mar 16 '18 at 15:01
https://stedolan.github.io/jq/manual/#inputs is new to me also, "Outputs all remaining inputs, one by one." — Freiheit, Mar 16 '18 at 15:03
Right -- used in conjunction with `-n` (so the first input doesn't get eaten being the initial context). — Charles Duffy, Mar 16 '18 at 15:04
That's arguably overkill here, where our input file has only one JSON object (vs jq's ability to handle streams with multiple objects), though. The value is that it lets us demonstrate a practice that's guaranteed to provide globally unique outputs even with more interesting inputs. — Charles Duffy, Mar 16 '18 at 15:05
BTW, feel free to add your own answer as its own, separate answer, vs. offering it as an edit. — Charles Duffy, Mar 16 '18 at 15:44

score 2 · Accepted Answer · edited Mar 16 '18 at 15:37

2

As I understand it:

You want globally unique outputs
You want only items with downloads==0
You want only items whose name contains "SNAPSHOT"

The following will accomplish that:

jq -r '
[.values[] | {(.name): .downloads}]
| add
| to_entries[]
| select(.value == 0)
| .key | select(contains("SNAPSHOT"))'

Rather than making unique an explicit step, this version generates a map from names to download counters (adding the values together -- which means that in case of conflicts, the last one wins), and thereby both ensures that the outputs are unique.

Given your test JSON, output is:

myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip

Applied to the overall problem context, this strategy can be used to simplify the overall process:

jq -r '[.values[] | {(.links.self.href): .downloads}] |  add | to_entries[] | select(.value == 0) | .key | select(contains("SNAPSHOT"))'

It simplifies the overall process by acting on the URL to the file rather than the name only. This simplifies the subsequent DELETE call. The sort and tr calls can also be removed.

edited Mar 16 '18 at 15:37

Freiheit

8,408
6
59
101

answered Mar 16 '18 at 15:03

Charles Duffy

280,126
43
390
441

`myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip` has 2 downloads though. – Freiheit Mar 16 '18 at 15:05
Not the second copy of it. Look more carefully at your input document. – Charles Duffy Mar 16 '18 at 15:07
No normalization is needed. I am testing with some inputs. My debugging and my overly broad examples and my edits are likely goofing me up. – Freiheit Mar 16 '18 at 15:11
OH thats clever! I had to remove the `-n` flag. Thank you. I see how this works and I learned some new jq commands. – Freiheit Mar 16 '18 at 15:21
@CharlesDuffy - `[{a:1},{a:2}]|add` does NOT yield `{"a":3}`. – peak Mar 16 '18 at 15:22
@peak, ...argh, you're right. I've cleaned up the text of the answer in light of that; will probably also need to clean up misrepresentations in the comment history. – Charles Duffy Mar 16 '18 at 15:26

peak · Answer 2 · 2018-03-16T17:35:29.987

Here's a solution which sums up the .download values per .name before making the selection based on the total number of downloads:

reduce (.values[] | select(.name | contains("SNAPSHOT"))) as $v
  ({}; .[$v.name] += $v.downloads)
| with_entries(select(.value == 0))
| keys_unsorted[]

Example:

$ jq -r -f program.jq input.json
myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip

p.s.

What is wrong with my select statement ...?

The problem that jumps out is the bit of the pipeline just before the "select" filter:

.values | {name: .[].name, downloads: .[].downloads}

The use of .[] in this manner results in the Cartesian product being formed -- that is, the above expression will emit n*n JSON sets, where n is the length of .values. You evidently intended to write:

.values[] | {name: .name, downloads: .downloads}

which can be abbreviated to:

.values[] | {name, downloads}

Thank you for the additional explanation of the core problem with my original code. — Freiheit, Mar 16 '18 at 21:00

How can I filter by a numeric field using jq?

2 Answers2

p.s.