Creating a new file while comparing two columns in two files using awk

Question

i am trying to compare two files (f1,f2) and create a third one f3. if field $1 is the same in both files then generate an sql query like this :

file f1

db1
db2
db3

file f2

db1 tab11
db1 tab12
db1 tab13
db2 tab21
db2 tab22
db3 tab31
db3 tab32
db3 tab33
db3 tab34

file f3


    ( TRIM(C.DATABASENAME) = 'db1' AND C.TABLENAME IN ('tab11', 'tab12','tab13')) OR
    ( TRIM(C.DATABASENAME) = 'db2' AND C.TABLENAME IN ('tab21', 'tab22')) OR
    ( TRIM(C.DATABASENAME) = 'db3' AND C.TABLENAME IN ('tab31', 'tab32', 'tab33','tab34' ))

Is it possible to create the file f3 with awk ?

this is what i came up with but it generates a file wit the wrong format of data

while read db; do
awk -v mydb=$db '{if ($1=mydb) printf "( TRIM(C.DATABASENAME) ="$1 "AND C.TABLENAME IN (" $2 "," }' f2 > f3
done < f1

On SO we do encourage people to add their efforts which they have put in order to solve their own problems so kindly do add the same and let us know then. — RavinderSingh13, Jan 17 '20 at 17:53
Thank you for adding your efforts :), I have added my solution, which produces exact output as your shown samples, please check it once and lemme know in its comment section how it goes.Also please do always add your efforts in your question like you did now, cheers :) — RavinderSingh13, Jan 21 '20 at 02:09
What happened with `tab34`? Why is `tab34` creating a table for `db4`? Shoudn't it be `( TRIM(C.DATABASENAME) = 'db3' AND C.TABLENAME IN ('tab31', 'tab32', 'tab33'))` in file f3? — KamilCuk, Jan 21 '20 at 02:45

RavinderSingh13 · Accepted Answer · 2020-01-21T03:58:56.507

Could you please try following.

awk '
BEGIN{
  s1="\047"
  OFS=", "
}
FNR==NR{
  a[$1]=(a[$1]?a[$1] OFS:"")s1 $2 s1
  next
}
($1 in a){
  print "( TRIM(C.DATABASENAME) = " s1 $1 s1 " AND C.TABLENAME IN (" a[$1]" )) OR"
}
'  Input_file2   Input_file1

Output will be as follows.

( TRIM(C.DATABASENAME) = 'db1' AND C.TABLENAME IN ('tab11', 'tab12', 'tab13' )) OR
( TRIM(C.DATABASENAME) = 'db2' AND C.TABLENAME IN ('tab21', 'tab22' )) OR
( TRIM(C.DATABASENAME) = 'db3' AND C.TABLENAME IN ('tab31', 'tab32', 'tab33' )) OR

Explanation: Adding detailed explanation for above code.

awk '                                               ##Starting awk program from here.               
BEGIN{                                              ##Starting BEGIN section of this program here.
  s1="\047"                                         ##Creating value of variable s1 to \047 which is octal value of single quote.
  OFS=", "
}
FNR==NR{                                            ##Checking condition FNR==NR which will be TRUE when first Input_file named file2 is being read.
  a[$1]=(a[$1]?a[$1] OFS:"")s1 $2 s1                ##Creating an array named a whose index is $1 and its value is $2 which is keep on adding to its own value for whole Input_file2.
  next                                              ##next will skip all further statements from here.
}
($1 in a){                                          ##Checking condition if $1 of Input_file1 current line is present in array a then do following.
  print "( TRIM(C.DATABASENAME) = " s1 $1 s1 " \
AND C.TABLENAME IN (" a[$1]" )) OR"                 ##Printing exact line like OP said.
}
'  file2  file1                                     ##Mentioning Input_file names here.

Don't want to be THAT guy, but `,` is missing. `(a[$1]?a[$1] OFS:"")s1 $2 s1` -> `(a[$1]?a[$1] ", ":"")s1 $2 s1` or `-vOFS=", "`. I guess the trailing `OR` we can remove with `sed '$s/OR//'`. Or like `for (i in a) { last=a[i]; } for (i in a) { print "( TRIM....." (a[i] != last ? "OR" : "") }` — KamilCuk, Jan 21 '20 at 02:47
@KamilCuk, Thank you for letting know Kamil, I have fixed the code now, cheers and keep rocking :) — RavinderSingh13, Jan 21 '20 at 03:59
Thank you all for your answers, comments and help! It solved my problem — AMI, Jan 21 '20 at 11:16

Masklinn · Answer 2 · 2020-01-20T14:54:28.893

0

You should be able to do essentially all you want in awk, but you probably want to use something else here:

you can use join(1) to filter file2 (join file1 file2 will return just the lines of file2 which match an entry of file1)
you can easily use awk to group tables by database
however the final formatting (which you'd do in the END pattern) is going to be absolute hell because join(ar, sep) is quite badly supported in awk, and since it has to be nested here it's going to be a mess.

You'd have a much easier time using something like Python or Ruby or PHP there. Hell, I think you'd have an easier time using bash.

edited Jan 20 '20 at 14:54

answered Jan 20 '20 at 10:18

Masklinn

34,759
3
38
57

Can we try to use just f2 and as long as the first column doesn't change loop on it and print the second column. – AMI Jan 20 '20 at 12:54
Sure but I'm not sure it's going to be a walk in the park, you'll still have some gnarly conditionals for the state change and formatting the various bits. Basically you'll be writing an awk script, at which point you're probably better off writing a script in whatever scripting language floats your boats where that sort of things is very easy. – Masklinn Jan 20 '20 at 14:56

Creating a new file while comparing two columns in two files using awk

2 Answers2