7

Given the input json

[
  {"title": "first line"},
  {"title": "second line"},
  {"title": "third line"}
]

How can we extract only the titles that contain keywords that are listed in a second "filter" array. Using a shell variable here for instance:

filter='["second", "third"]'

The output in this case would be

[
  {"title": "second line"},
  {"title": "third line"}
]

Also, how to use the array filter to negate instead. Eg: return only the "first line" entry in the previous example.

There is a similar reply but using an old version of jq. I hope that there's a more intuitive/readable way to do this with the current version of jq.

peak
  • 105,803
  • 17
  • 152
  • 177
Bernard
  • 16,149
  • 12
  • 63
  • 66

2 Answers2

5

You can use a combination of jq and shell tricks using arrays to produce the filter. Firstly to produce the shell array, use an array notation from the shell as below. Note that the below notation of bash arrays will not take , as a separator in its definition. Now we need to produce a regex filter to match the string, so we produce an alternation operator

filter=("first" "second")
echo "$(IFS="|"; echo "${filter[*]}"
first|second

You haven't mentioned if the string only matches in the first or last or could be anywhere in the .title section. The below regex matches for the string anywhere in the string.

Now we want to use this filter in the jq to match against the .title string as below. Notice the use of not to negate the result. To provide the actual match, remove the part |not.

jq --arg re "$(IFS="|"; echo "${filter[*]}")" '[.[] | select(.title|test($re)|not)]' < json
Inian
  • 80,270
  • 14
  • 142
  • 161
  • The use of `test()` is the key here. Thanks for also adding that clever way to parse a shell array and pass the result as input. – Bernard Feb 11 '19 at 21:20
  • @Alkaline - Please note that jq's `test` is based on regex matches, whereas the original question calls for keyword matches. In general, using `test` naively can give very different results compared to string-based keyword matching. – peak Feb 11 '19 at 23:06
2

One way to solve a problem that involves the word "any" is often to use jq's any, e.g. using your shell variable:

jq --argjson filter "$filter" '
  map((.title | split(" ")) as $title
      | select(any( $title[] as $t
                    | $filter[] as $kw
                    | $kw == $t )))' input.json

Negation

As in formal logic, you can use all or any (in conjunction with negation) to solve the negated problem. But don't forget that if you use not, jq's not is a zero-arity filter.

jq --argjson filter "$filter" '
  map((.title | split(" ")) as $title
      | select(all( $title[] as $t
                    | $filter[] as $kw
                    | $kw != $t )))' input.json

Other approaches

The above uses "keyword matching" as that is what the question specifies, but of course the above jq expressions can easily be modified to use regexes or some other type of matching.

If the list of keywords is very long, then a better algorithm for array-intersection would no doubt be desirable.

peak
  • 105,803
  • 17
  • 152
  • 177