merging 2 files into a third one, using columns as index and merging lines too

Question

I've been studying awk and i've come upon a problem i'm not being able to solve, please help if you can.

I have 2 files I generated using awk, sort and uniq -c.

File 1 is in the format:

1 aaa.c 10/10/2010

1 bbb.h 1/1/2011

3 ccc.c 2/2/2012

1 ccc.c 20/6/2011

1 ddd.c 1/1/2010

1 ddd.c 2/4/1999

1 ddd.c 7/1/2012

1 ddd.c 10/1/1977

Meaning: number_of_equal_files name date (so, 3 files ccc.c from the same date and 1 file ccc.c from another)

File 2 is in the format:

4 ddd.c

2 ccc.c

3 xxx.c

Meaning: number_of_different_dates name (so, ccc.c has been found with 2 different dates) -> files that would have number=1 i removed usind a reverse grep, so there won't be any

What i'd like to do is to generate a third file in the format

number_of_different_dates name date1 date2 date 3 date4 (...)

something like:

2 ccc.c 2/2/2012 20/6/2011 

4 ddd.c 1/1/2010 2/4/1999 7/1/2012 10/1/1977

Thanks in advance!

oh, and the files are ordered by name! i put the example in the wrong order, sorry — Jonas Tadeu, Jan 19 '12 at 14:02

Mark Wilkins · Accepted Answer · 2012-01-19T14:37:29.917

2

You should be able to get that result using only the first file as input. The following uses two associative arrays. The first collects the number of times a file is seen and the second collects the dates. The END block just prints the entries that appeared more than once.

{
   counts[$2] += 1;
   dates[$2] = sprintf( "%s %s", dates[$2], $3 );
}

END {
   for ( f in dates ) {
      if ( counts[f] > 1 )
     printf( "%d %s %s\n", counts[f], f, dates[f]);
   }
}

edited Jan 19 '12 at 14:37

answered Jan 19 '12 at 14:01

Mark Wilkins

40,729
5
57
110

awesome, thanks for the quick reply! it doesnt show the number of different dates, but that's not important. i just added it here. thanks again! – Jonas Tadeu Jan 19 '12 at 14:34
@JonasTadeu: Ah! You are right; I didn't notice that in the example. I updated the printf to add it in. – Mark Wilkins Jan 19 '12 at 14:38
yeah, I did the same, only i put counts between f and dates.. anyway, its amazing.. it'll be so easy to erase useless files.. thanks again! – Jonas Tadeu Jan 19 '12 at 15:16

jaypal singh · Answer 2 · 2012-01-19T22:56:58.410

1

You can try something like this -

#!/usr/bin/awk -f

NR==FNR{
            a[$3]=$2; b[$2]++;next
       } 

($2 in b){
            printf ("%s %s ", $1,$2);
            for (i in a) 
                if (a[i]==$2) 
                    printf i" "; print ""
          }

Test:

[jaypal:~/Temp] cat file1
1 aaa.c 10/10/2010

1 bbb.h 1/1/2011

3 ccc.c 2/2/2012

1 ccc.c 20/6/2011

1 ddd.c 1/1/2010

1 ddd.c 2/4/1999

1 ddd.c 7/1/2012

1 ddd.c 10/1/1977

[jaypal:~/Temp] cat file2
4 ddd.c

2 ccc.c

3 xxx.c

[jaypal:~/Temp] ./s.awk ff1 ff2
4 ddd.c 10/1/1977 1/1/2010 2/4/1999 7/1/2012 

2 ccc.c 20/6/2011 2/2/2012

edited Jan 19 '12 at 22:56

answered Jan 19 '12 at 14:55

jaypal singh

74,723
23
102
147

thanks for the answer too, but i chose Mark's cuz its a little easier to understand. thanks anyway! – Jonas Tadeu Jan 19 '12 at 15:18
Sure no problem. Just wanted to offer an alternate. :) – jaypal singh Jan 19 '12 at 15:19

merging 2 files into a third one, using columns as index and merging lines too

2 Answers2

Test: