2

I am trying to correct one file with another with a single line of AWK code. I am trying to take $1 from FILE2, look it up in FILE1, get the corresponding $3 and $4. After I set them as variables I want the program to stop evaluating FILE1, change $10 and $11 from FILE2 to the values of the variables, and print this out.

I am having trouble getting the awk to switch from FILE1 to FILE2 after I have extracted the variables. I've tried nextfile, but this resets the program and it tires to extract variables from FILE2, I set NR to the last Record, but it did not switch.

I am also doing a loop to get each line out of FILE1, but if that can be part of the script I am sure it would speed things up not having to reopen awk over and over again.

here is the parts I have figured out.

for file in `cut -f 1 FILE2`; do
awk -v a=$file '$1=a{s=$2;q=$4; ---GO TO FILE1---}{if ($1==a) {$10=s; $11=q; print 0;exit}' FILE1 FILE2 >> FILEOUT
done

a quick example set NOTE: Despite how I have this written, the two files are not in the same order and on the order of 8GB in size, so a little unwieldy to sort.

FILE1

A 12345 + AJD$JD
B 12504 + DKFJ#%
C 52042 + DSJTJE

FILE2

A 2 3 4 5 6 7 8 9 345 D$J 
B 2 3 4 5 6 7 8 9 250 KFJ
C 2 3 4 5 6 7 8 9 204 SJT

OUTFILE

A 2 3 4 5 6 7 8 9 12345 AJD$JD 
B 2 3 4 5 6 7 8 9 12504 DKFJ#%
C 2 3 4 5 6 7 8 9 52042 DSJTJE

This is the code I got to work based on Kent's answer below.

awk 'NR==FNR{a[$1]=$2" "$4;next}$1 in a{$9=$9" "a[$1]}{$10="";$11=""}2' f1 f2 
jeffpkamp
  • 2,732
  • 2
  • 27
  • 51

2 Answers2

3

No need to loop over the files repeatedly - just read one file and store the relevant fields in arrays keyed on $1, then go through the other file and use those arrays to look up the values you want to insert.

awk '(FILENAME=="FILE1"){y[$1]=$2;z[$1]=$4}; (FILENAME=="FILE2" && $1 in y){$10=y[$1];$11=z[$1];print $0}' FILE1 FILE2

That said, it sounds like you might have a use for the join command here rather than messing about with awk (the above script assumes all your $1/$2/$4 values will fit in memory).

pobrelkey
  • 5,853
  • 20
  • 29
  • quick qustion, shoudl the second part start "(FILENAME=="FILE2" && x[$1]==$1)" ? – jeffpkamp Dec 05 '13 at 22:33
  • No. `x` is intended to be an array that lets us easily check which values of `$1` exist in FILE1. The fact that a key exists in that array is the important thing - the value stored at that key is irrelevant (I just used the constant `1`). – pobrelkey Dec 05 '13 at 22:38
  • Actually, my code was a bit brain-dead - reading the other answer reminded me of the `in` operator, which I probably should have used instead. I've edited my answer to use this and get rid of `x` - hopefully the intent of that check is now clear. – pobrelkey Dec 05 '13 at 23:16
  • I'm still working through understanding this (I am having a hard time grasping arrays). However the output I get has nothing for $10 and $11. – jeffpkamp Dec 05 '13 at 23:58
  • description above was not accurate. I don't get any output if I have it setup FILENAME==FILE1 and FILENAME==FILE2. When I make them the same, I get the file unaltered (if I correct the columns put into the array. – jeffpkamp Dec 06 '13 at 00:21
3

try this one-liner:

kent$  awk 'NR==FNR{a[$1]=$2" "$4;next}$1 in a{NF-=2;$0=$0" "a[$1]}7' f1 f2
A 2 3 4 5 6 7 8 9 12345 AJD$JD
B 2 3 4 5 6 7 8 9 12504 DKFJ#%
C 2 3 4 5 6 7 8 9 52042 DSJTJE
Kent
  • 189,393
  • 32
  • 233
  • 301
  • I checked this on my files, I am just getting the unchanged f2 as output. I think one problem is that F1 has all the lines in F2, but not vise versa. I guess I could try it with the full set and see what happens. – jeffpkamp Dec 05 '13 at 23:59
  • @user2348290 that shouldn't be problem. can you test with the small example in your question, if my one-liner works for it? – Kent Dec 06 '13 at 09:29
  • alright I figured out what was going on ($1 in f1 had a @ in front of the $1). The only problem I am having is there are optional fields after $11 which are sometimes there and sometimes not (Sorry I didn't include this in my example). How can I get this script to strickly replace $10 and $11 with the array input? – jeffpkamp Dec 06 '13 at 16:56
  • okay so I edited the code to do what i wanted it to do. I've put it in the original post. but I don't understand 2 parts of it. Part 1: I don't understand why the output prints without a print command. Part 2 (Which I think is part of my first question) what does the number at the end of the statement do "7". I changed it to other numbers with no clear affect. Just curious – jeffpkamp Dec 06 '13 at 17:18