0

I have a flat file separated by | that I want to update from information already inside of the flat file. I want to fill the third field using information from the first and second. From the first field I want to ignore the last two numbers when using that data to compare against the data missing the third field. When matching against the second field I want it to be exact. I do not want to create a new flat file. I want to update the existing file. I researched a way to pull out the first two fields from the file but I do not know if that will even be helpful for the goal I am trying to achieve. To sum all of that up, I want to compare the first and second fields to other fields in the file to pull the third field that may be missing on some of the lines on the flat file.

awk -F'|' -v OFS='|' '{sub(/[0-9 ]+$/,"",$1)}1 {print $1 "\t" $2}' tstfile

first field|second field|third field

Original intput:

t1ttt01|/a1

t1ttt01|/b1

t1ttt01|/c1

t1ttt03|/a1|1

t1ttt03|/b1|1

t1ttt03|/c1|1

l1ttt03|/a1|3

l1ttt03|/b1|3

l1ttt03|/c1|3

What it should do:

t1ttt03|/a1|1 = t1ttt01|/a1

when comparing t1ttt|/a1| = t1ttt|/a1

Therefore

t1ttt01|/a1 becomes t1ttt01|/a1|/1

What I want the Output to look like:

t1ttt01|/a1|1

t1ttt01|/b1|1

t1ttt01|/c1|1

t1ttt03|/a1|1

t1ttt03|/b1|1

t1ttt03|/c1|1

l1ttt03|/a1|3

l1ttt03|/b1|3

l1ttt03|/c1|3
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
user2607210
  • 7
  • 1
  • 4
  • You need to rewrite this question, it is very very hard to understand what you are trying to do here. Please post a sample of the original input and the exact way you want it formatted for the output. – Geoffrey Jul 24 '13 at 14:55

1 Answers1

0

One way with awk:

awk '

# set the input and output field separator to "|"

BEGIN{FS=OFS="|"}

# Do this action when number of fields on a line is 3 for first file only. The
# action is to strip the number portion from first field and store it as a key
# along with the second field. The value of this should be field 3

NR==FNR&&NF==3{sub(/[0-9]+$/,"",$1);a[$1$2]=$3;next} 

# For the second file if number of fields is 2, store the line in a variable
# called line. Validate if field 1 (without numbers) and 2 is present in
# our array. If so, print the line followed by "|" followed by value from array.

NF==2{line=$0;sub(/[0-9]+$/,"",$1);if($1$2 in a){print line OFS a[$1$2]};next}1
' file file

Test:

$ cat file
t1ttt01|/a1
t1ttt01|/b1
t1ttt01|/c1
t1ttt03|/a1|1
t1ttt03|/b1|1
t1ttt03|/c1|1
l1ttt03|/a1|3
l1ttt03|/b1|3
l1ttt03|/c1|3
$ awk 'BEGIN{FS=OFS="|"}NR==FNR&&NF==3{sub(/[0-9]+$/,"",$1);a[$1$2]=$3;next}NF==2{line=$0;sub(/[0-9]+$/,"",$1);if($1$2 in a){print line OFS a[$1$2]};next}1' file file
t1ttt01|/a1|1
t1ttt01|/b1|1
t1ttt01|/c1|1
t1ttt03|/a1|1
t1ttt03|/b1|1
t1ttt03|/c1|1
l1ttt03|/a1|3
l1ttt03|/b1|3
l1ttt03|/c1|3
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • Jaypal that almost works. If I use the example I gave you it work fine but my file is huge. Is that the reason why it is not working? – user2607210 Jul 24 '13 at 16:23
  • What's not working? What is the error message? How big is your file? – jaypal singh Jul 24 '13 at 16:28
  • @user2607210 One potential issue I see with the answer is that if order is not preserved in the input data I will overwrite the value in my array and you may see no value in column 3. I have fixed the answer. Can you please try again and let me know if you still see an issue. – jaypal singh Jul 24 '13 at 16:54
  • If i do a ll -s my file is 272 and it will grow. It still is not quote working. – user2607210 Jul 24 '13 at 17:27
  • Do you get any errors or it just prints the file as is without adding column 3? Was your file created on Windows? – jaypal singh Jul 24 '13 at 17:32
  • I do not get any errors and it does not add anything to the column 3. This file was created as a csv in windows and then all of the , where replace with | but has been uploaded as a flat file to Linux. The other part of useful information is that there are a few instances where the first and second column will not match up with other columns. – user2607210 Jul 24 '13 at 17:45
  • The problem is your files has `^M` characters since it was created on windows. Do `dos2unix filename` to convert it to unix format and run the `awk` command. – jaypal singh Jul 24 '13 at 17:52
  • I did not upload the file, I remember now. I used vi and copied from windows and then pasted it into linux. I get the following when I try to run your command. -bash: dos2unix: command not found – user2607210 Jul 24 '13 at 17:59
  • Do this on your file `sed -i 's/^M//' filename`. Where `^M` is entered by pressing `Ctrl M`. This will convert the file to linux format and you can run `awk` on it. – jaypal singh Jul 24 '13 at 19:49
  • Jaypal I am still testing with my file I will give you an update in a few. – user2607210 Jul 25 '13 at 13:52
  • Jaypal I found the problem. It was writing over the old file. The way I tested that was by doing the following. awk ' BEGIN{FS=OFS="|"} NR==FNR&&NF==3{sub(/[0-9]+$/,"",$1);a[$1$2]=$3;next} NF==2{line=$0;sub(/[0-9]+$/,"",$1);if($1$2 in a){print line OFS a[$1$2]};next}1 ' filesystemstst filesystemstst > output Is there a way to make it write to the same file? – user2607210 Jul 25 '13 at 14:18
  • @user2607210 With `GNU awk 4.1` you can do in-file changes. If you don't have that just redirect to a `tmp` file and do `mv tmp originalfile`. So something like `awk {...} file file > tmp && mv tmp file` – jaypal singh Jul 25 '13 at 14:20
  • I also wanted to sort the file. When i do the following I get duplicates. awk ' BEGIN{FS=OFS="|"} NR==FNR&&NF==3{sub(/[0-9]+$/,"",$1);a[$1$2]=$3;next} NF==2{line=$0;sub(/[0-9]+$/,"",$1);if($1$2 in a){print line OFS a[$1$2]};next}1 ' filesystemstst filesystemstst | sort -t '|' > tmp2 && mv -f tmp2 filesystemstst – user2607210 Jul 25 '13 at 15:09
  • I think I go it I ended up using uniq awk ' BEGIN{FS=OFS="|"} NR==FNR&&NF==3{sub(/[0-9]+$/,"",$1);a[$1$2]=$3;next} NF==2{line=$0;sub(/[0-9]+$/,"",$1);if($1$2 in a){print line OFS a[$1$2]};next}1 ' filesystemstst filesystemstst | sort -t '|' | uniq > tmp2 && mv -f tmp2 filesystemstst Would you mind giving a quick explanation of your code? – user2607210 Jul 25 '13 at 15:24
  • @user2607210 Added the explanation. Hope that helps. – jaypal singh Jul 26 '13 at 17:31
  • Thanks Jaypal you are the awk master. – user2607210 Jul 26 '13 at 17:44
  • @user2607210 I wish I was my friend. Still learning new things about it every day! `:)`. – jaypal singh Jul 26 '13 at 17:49
  • I have one small issue. Some lines will not have a third field, because they are the first of there kind. The problem I have now is the lines that are the first of there kind do not have matches. They are being deleted from my file. example: t5ttt33 |/ example: t5ttt33 |/var – user2607210 Jul 26 '13 at 19:11