0

I use ec2.py dynamic inventory script with ansible to extract a list of ec2 hosts and their tag names. It returns me a list of JSON as below,

  "tag_aws_autoscaling_groupName_asg_test": [
    "aa.b.bb.55",
    "1b.b.c.d"
  ],

  "tag_aws_autoscaling_groupName_asg_unknown": [
    "aa.b.bb.55",
    "1b.b.c.e"
  ],

I'm using jq for parsing this output.

  1. How can I extract only fields common to both these ASG?
  2. How can I extract only fields unique to both these ASG?
peak
  • 105,803
  • 17
  • 152
  • 177
Kumaran S
  • 109
  • 10
  • 1. Please provide at least one minimal example with valid JSON input, and the expected outputs. 2. Your questions are very unclear in several respects. For example, when you say "field", do you mean the key names (such as "tag_aws_autoscaling_groupName_asg_test") or the key values (the arrays)? – peak Dec 16 '16 at 13:37

2 Answers2

3

difference/2

Because of the way jq's "-" operator is defined on arrays, one invocation of unique is sufficient to produce a "uniquified" answer:

def difference($a; $b): ($a | unique) - $b;

Similarly, for the symmetric difference, a single sorting operation is sufficient to produce a "uniquified" value:

def sdiff($a; $b): (($a-$b) + ($b-$a)) | unique;

intersect/2

Here is a faster version of intersect/2 that should work with all versions of jq -- it eliminates group_by in favor of sort:

def intersect(x;y):
  ( (x|unique) + (y|unique) | sort) as $sorted
  | reduce range(1; $sorted|length) as $i
      ([];
       if $sorted[$i] == $sorted[$i-1] then . + [$sorted[$i]] else . end) ;

intersection/2

If you have jq 1.5, then here's a similar but still measurably faster set-intersection function: it produces a stream of the elements in the set-intersection of the two arrays:

def intersection(x;y):
  (x|unique) as $x | (y|unique) as $y
  | ($x|length) as $m
  | ($y|length) as $n
  | if $m == 0 or $n == 0 then empty
    else { i:-1, j:-1, ans:false }
    | while(  .i < $m and .j < $n;
        $x[.i+1] as $nextx
        | if $nextx == $y[.j+1] then {i:(.i+1), j:(.j+1), ans: true, value: $nextx}
          elif  $nextx < $y[.j+1] then .i += 1 | .ans = false
          else  .j += 1 | .ans = false
          end )
    end
  | if .ans then .value else empty end ;
peak
  • 105,803
  • 17
  • 152
  • 177
1

To find items common between two arrays, just perform a set intersection between the two. There's no intersection function available but it should be simple enough to define on your own. Take the unique items of each array, group them up by value, then take the items where there are more than 1 in a group.

def intersect($a; $b): [($a | unique)[], ($b | unique)[]]
    | [group_by(.)[] | select(length > 1)[0]];

Using this, to find the common elements (assuming your input is actually a valid json object):

$ jq 'def intersect($a; $b): [($a | unique)[], ($b | unique)[]]
    | [group_by(.)[] | select(length > 1)[0]];
intersect(.tag_aws_autoscaling_groupName_asg_test;
          .tag_aws_autoscaling_groupName_asg_unknown)' < input.json
[
  "aa.b.bb.55"
]

To find items unique to an array, just perform the set difference.

$ jq 'def difference($a; $b): ($a | unique) - ($b | unique);
difference(.tag_aws_autoscaling_groupName_asg_test;
           .tag_aws_autoscaling_groupName_asg_unknown)' < input.json
[
  "1b.b.c.d"
]
Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • Thanks a lot Jeff and Peak. Both of your inputs are really valuable and helped me out. Actual line from my code after using variables for each tags that might help other folks.. match_hosts=$($inventory_script_dir/ec2.py | jq --arg c "$tag1" --arg d "$tag2" 'def intersect(a; b): [(a | unique)[], (b | unique)[]] | [group_by(.)[] | select(length > 1)[0]];intersect(.[$c];.[$d])') – Kumaran S Dec 17 '16 at 08:09