bash -- merging and manipulation 2 files

Question

I have 2 files of which I currently manipulate each one in awk:

======================= File 1: ===================

 0x0002 RUNNING  EXISTS foo 253 65535
 0x0003 RUNNING  EXISTS foo 252 5
 0x0004 RUNNING  EXISTS foo 251 3

I'm interested in the first field and the last 2.

Field 1: vdisk(in hex). Last two fields are the possible Cdisks for each vdisk. At least 1 must exist. the values are decimal. If the number "65535" appears, it means that the 2nd cdisk is non-existent.

I use this awk to display a user friendly table:

 awk 'BEGIN {print "vdisk cdisk  Mr_cdisk"} 
 {
      if ( $3 ~ /EXISTS|THIS_AGENT_ONLINE/ ) {
           sub("65535", "N/A")
           printf "%-11s %-6s %s\n",$1,$(NF-1),$(NF)
      }
  }' ${FILE}

Will produce this table:

vdisk  cdisk  Mr_cdisk
0x0002 253    N/A
0x0003 252    5
0x0004 1      3

======================= File 2: ===================

0x0000 Cmp cli Foo 0 SOME 0 0x0 0x0 0x0
0x0001 Cmp own Foo 1 NONE 0 0x0 0x0 0x0
0x0002 Cmp cli Foo 0 SOME 0 0x0 0x1 0x0
0x0003 Cmp own Foo 0 NONE 0 0x0 0x0 0x1
0x0004 Cmp cli Foo 0 SOME 0 0x0 0x0 0x0
0x0005 Cmp own Foo 1 NONE 0 0x1 0x0 0x0

I'm interested in the "Cmp own" lines, in which the first field is the Cdisk (in hex). The 5th field from the end (just before the SOME/NONE text), is the instance number. It's either 0 or 1. I use this awk to display a user friendly table:

awk 'BEGIN {print "cdisk(hex)  RACE_Instance"}
                    /Cmp own/ {
                         printf "%-11s %-10s\n",$1,$(NF-5)
                    }' ${FILE};

This will produce the following table:

cdisk(hex)  Instance
0x0001      1
0x0003      0
0x0005      1

++++++++++++++++++++++++++++++++++++++

What would I like to display a merged table. Preferably, directly from the original files. It should spread the first data into 2 lines (if there's more than 1 cdisk). This will be the base for the merge. Then print the Instance number, if exist per this cdisk.

vdisk(hex)  cdisk(hex)  Instance
0x0002      0x00fd      N/A
0x0003      0x00fc      N/A
0x0003      0x0005      1
0x0004      0x0001      0
0x0004      0x0003      1

I would definitely prefer a solution with awk. :)

Thanks!

EDIT: added some more info and correction to one data table.

EDIT2: Simplified input

So which field are you trying to merge on? Do you still want the separate tables, or are you looking for a way to go directly from the input files to the final output? — Tom Fenech, Aug 10 '14 at 14:33
I would prefer to get a final output directly. I need to merge the cdisk field. Each "vdisk" can have a max of 2 "cdisk"s. Either the cdisk or/and the "Mr_cdisk" field. — Maxim_united, Aug 10 '14 at 14:53
I think with a little effort you could make your problem easier for us to understand and so help us to help you. Get rid of the --non-decimal data flag and reduce your sample input output to 3 or 4 lines of 3 or 4 space-separated fields each that represent your current problem. — Ed Morton, Aug 10 '14 at 15:10
Added some more info. Hope this makes it a bit clearer. The non-decimal flag makes it "understand" that the input (first field in both files) is in hex, making it easier to convert to decimal. — Maxim_united, Aug 10 '14 at 15:31
I know what the non-decimal flag does, my point is that's just obfuscating your question. WE don't need to know anything about that to answer THIS QUESTION. That fact that you need to deal with that in your real data doesnt stop you from posting a simplified version of your data that's less complex and so highlights what your actual, current question is and so makes it much easier for us to help you. If you did what I suggested, youd certainly have an answer by now. — Ed Morton, Aug 10 '14 at 15:48
Not sure I fully understood you in the beginning. I've simplified the problem a bit more. Hope that helps. — Maxim_united, Aug 10 '14 at 16:22
What is the size of the real data? I think two solutions: one is to read the first files and fetch the "Instance" column for each vidsk/cdisk. Another solution is to prepare two tmp results then to merge them. The second solution is a little more complex, but we can divide to conquer AND more important, its complexity is better. — mcoolive, Aug 10 '14 at 19:12

score 0 · Answer 1 · answered Aug 11 '14 at 00:53

0

I couldn't figure out what the mapping is from your 2 input files to your output but this should point you in the right direction:

$ cat tst.awk
NR==FNR {
    v2c[$1] = sprintf("0x%04x",$5)
    v2m[$1] = ( $6==65535 ? "N/A" : sprintf("0x%04x",$6) )
    next
}

$1 in v2c {
    print $1, v2c[$1], $5
    print $1, v2m[$1], $5
}
$
$ awk -f tst.awk file1 file2
0x0002 0x00fd 0
0x0002 N/A 0
0x0003 0x00fc 0
0x0003 0x0005 0
0x0004 0x00fb 0
0x0004 0x0003 0

answered Aug 11 '14 at 00:53

Ed Morton

188,023
17
78
185

I'm not sure what to elaborate on, it's just populating a couple of associative arrays when reading the 1st file and then printing them along with fields from the 2nd file when reading the 2nd file. Is there anything in particular you don't understand? – Ed Morton Aug 11 '14 at 13:31
Well, I wasn't that aware of array usage in awk. How is each file separated into each array? Also, the result is a bit different from my example and expected table. vdisk 0x2 is associated to cdisk 0xfd only (the 2nd one is N/A), so should have only one line in result. For vdisk 0x3, cdisk 0x5 and vdisk 0x4 cdisk 0x3 both should have instance 1. Also, vdisk 0x2, cdisk 0xfd should not have 0 as instance. Should be N/A. I'm sorry if I'm having trouble in defining exactly what I need. Your help is appreciated. – Maxim_united Aug 11 '14 at 14:31
1

Each file is automatically read one line at a time and split into fields (separated by spaces by default) 1->NF so when you see `arr[$1] = $2` it's populating an array `arr` indexed by the first field values from your input file and containing the 2nd field values. I know the output isnt what you want I just couldnt figure out how you're mapping the file contents to that output so I gave you the structure you need and you can tidy up the mappings. – Ed Morton Aug 11 '14 at 14:40
So each array is storing each file? – Maxim_united Aug 11 '14 at 15:03
No, each array is storing a mapping from a key field to a specific value from the first file. Get the book "Effective Awk Programming, Third Edition" by Arnold Robbins. I think youre just missing too much of the basics for us to get anywhere with Q&A. – Ed Morton Aug 11 '14 at 15:10
1

@Maxim_united - also, try adding print statements to the script to dump the array indices and values and anything else you're not sure of. – Ed Morton Aug 11 '14 at 16:04

bash -- merging and manipulation 2 files

1 Answers1