I have some set of numbers that describes connections between the first set of integers and the second set of integers. For example:
1,2
3,4
5,6
5,7
6,8
I then load my data as follows, and group it:
data = load 'data.csv' as integer_1, integer_2;
grouped = group data by integer_1;
grouped_numbers = foreach grouped generate group as node, data.integer_2 as connection;
Which then yields a bag with each first integer and its first-degree connections:
(1,{(2)})
(3,{(4)})
(5,{(6),(7)})
(6,{(8)})
I would then like to do a self-join of the grouped_numbers bag, in order to give the resultant first integer with each of its first- and second-degree connections. In this case, that would be:
(1,{(2)})
(3,{(4)})
(5,{(6),(7),(8)})
(6,{(8)})
because 5 is connected to 6, which is connected to 8, so 8 is a second-degree connection of 6. How would I implement this in Pig?