0

I would like to append non-matching fields from other records to the current record's field.

The first field of each record is a group ID. Every person is matched with someone who is not in their group ID. All possible matches are needed.

For example, given names.db:

1 Nikola Tesla
1 Pierre-Simon Laplace
1 Oliver Heaviside
2 James Watson
2 Francis Crick
3 Kanye West
4 Michael Faraday
4 Lord Rayleigh

turns into:

Nikola Tesla -> James Watson
Nikola Tesla -> Francis Crick
Nikola Tesla -> Kanye West
Nikola Tesla -> Michael Faraday
Nikola Tesla -> Lord Rayleigh

Pierre-Simon Laplace -> James Watson
Pierre-Simon Laplace -> Francis Crick
Pierre-Simon Laplace -> Kanye West
Pierre-Simon Laplace -> Michael Faraday
Pierre-Simon Laplace -> Lord Rayleigh

Oliver Heaviside -> James Watson
Oliver Heaviside -> Francis Crick
Oliver Heaviside -> Kanye West
Oliver Heaviside -> Michael Faraday
Oliver Heaviside -> Lord Rayleigh

James Watson -> Nikola Tesla
James Watson -> Pierre-Simon Laplace
James Watson -> Oliver Heaviside
James Watson -> Kanye West
James Watson -> Michael Faraday
James Watson -> Lord Rayleigh

Francis Crick -> Nikola Tesla
Francis Crick -> Pierre-Simon Laplace
Francis Crick -> Oliver Heaviside
Francis Crick -> Kanye West
Francis Crick -> Michael Faraday
Francis Crick -> Lord Rayleigh

Kanye West -> Pierre-Simon Laplace
Kanye West -> James Watson
Kanye West -> Oliver Heaviside
Kanye West -> Francis Crick
Kanye West -> Michael Faraday
Kanye West -> Nikola Tesla
Kanye West -> Lord Rayleigh

Michael Faraday -> Nikola Tesla
Michael Faraday -> Pierre-Simon Laplace
Michael Faraday -> Oliver Heaviside
Michael Faraday -> James Watson
Michael Faraday -> Francis Crick
Michael Faraday -> Kanye West

Lord Rayleigh -> Nikola Tesla
Lord Rayleigh -> Pierre-Simon Laplace
Lord Rayleigh -> Oliver Heaviside
Lord Rayleigh -> James Watson
Lord Rayleigh -> Francis Crick
Lord Rayleigh -> Kanye West
EarthIsHome
  • 655
  • 6
  • 18
  • You can first do a cross product of the lines in the file (but that would be without awk). Then with awk, you can just check $1 == $3 and print $2->$4. Can you be a bit more specific about whether you want to use ONLY awk? – user3334059 Nov 25 '15 at 23:05
  • There's no reason for it to be only awk.. What tool would be easier in doing the cross product of the lines based on the first field? – EarthIsHome Nov 25 '15 at 23:07
  • Well, if you are not restricted to using bash, I would highly recommend writing a small python script to do this. For cross product you can check http://stackoverflow.com/questions/23363003/how-to-produce-cartesian-product-in-bash. There are other SO answers as well. – user3334059 Nov 25 '15 at 23:19
  • 2
    SQL - http://sqlfiddle.com/#!9/c1156c/11/0 (scroll up for the code). e.g. sqlite – TessellatingHeckler Nov 25 '15 at 23:20
  • I don't know what `...` and `and so on` specifically means. Edit your question to get rid of all the ambiguity and just have clear, concrete, testable sample input and expected output or we're just guessing and chances are we won't bother trying to create input to test a potential solution against and even if we did we won't KNOW if the answers right or not. – Ed Morton Nov 26 '15 at 00:38

2 Answers2

1

I know your mean.

Try This:

awk '{b=$1;sub($1" ","");a[$0]=b}END{for(i in a){for(j in a){if(i!=j&&a[i]!=a[j])print i" -> "j}print ""}}' file
bian
  • 1,456
  • 8
  • 7
  • This is very close; What is `a[$0]` doing? I ran your line which outputs 7 entries for Pierre-Simon Laplace, when it should only return 5 entries (anyone who is not in the `1` category. I think I can modify this to work. – EarthIsHome Nov 26 '15 at 02:39
  • If the first character in `i` does not equal the first character in `j`, then `print i "-> "j` – EarthIsHome Nov 26 '15 at 03:05
  • 1
    Updateed, add =$1. Test OK. – bian Nov 26 '15 at 03:21
0

A non-awk solution

$ join -t' ' -j 9 names{,} 
     | sed -r '/([1-9]).*\1/d;s/[1-9]//;s/[1-9]/-->/' 

  Nikola Tesla --> James Watson
  Nikola Tesla --> Francis Crick
  Nikola Tesla --> Kanye West
  Nikola Tesla --> Michael Faraday
  Nikola Tesla --> Lord Rayleigh
  Pierre-Simon Laplace --> James Watson
  Pierre-Simon Laplace --> Francis Crick
  Pierre-Simon Laplace --> Kanye West
  Pierre-Simon Laplace --> Michael Faraday
  Pierre-Simon Laplace --> Lord Rayleigh
  Oliver Heaviside --> James Watson
  Oliver Heaviside --> Francis Crick
  ...
  Michael Faraday --> Francis Crick
  Michael Faraday --> Kanye West
  Lord Rayleigh --> Nikola Tesla
  Lord Rayleigh --> Pierre-Simon Laplace
  Lord Rayleigh --> Oliver Heaviside
  Lord Rayleigh --> James Watson
  Lord Rayleigh --> Francis Crick
  Lord Rayleigh --> Kanye West

Explanation: Create cross product, remove lines with matching digits, remove first digit, replace second digit with arrow. Of course it can all be done with awk but I tried something else for a change.

karakfa
  • 66,216
  • 7
  • 41
  • 56