2

I've been studying awk and i've come upon a problem i'm not being able to solve, please help if you can.

I have 2 files I generated using awk, sort and uniq -c.

File 1 is in the format:

1 aaa.c 10/10/2010

1 bbb.h 1/1/2011

3 ccc.c 2/2/2012

1 ccc.c 20/6/2011

1 ddd.c 1/1/2010

1 ddd.c 2/4/1999

1 ddd.c 7/1/2012

1 ddd.c 10/1/1977

Meaning: number_of_equal_files name date (so, 3 files ccc.c from the same date and 1 file ccc.c from another)

File 2 is in the format:

4 ddd.c

2 ccc.c

3 xxx.c

Meaning: number_of_different_dates name (so, ccc.c has been found with 2 different dates) -> files that would have number=1 i removed usind a reverse grep, so there won't be any

What i'd like to do is to generate a third file in the format

number_of_different_dates name date1 date2 date 3 date4 (...)

something like:

2 ccc.c 2/2/2012 20/6/2011 

4 ddd.c 1/1/2010 2/4/1999 7/1/2012 10/1/1977

Thanks in advance!

jaypal singh
  • 74,723
  • 23
  • 102
  • 147

2 Answers2

2

You should be able to get that result using only the first file as input. The following uses two associative arrays. The first collects the number of times a file is seen and the second collects the dates. The END block just prints the entries that appeared more than once.

{
   counts[$2] += 1;
   dates[$2] = sprintf( "%s %s", dates[$2], $3 );
}

END {
   for ( f in dates ) {
      if ( counts[f] > 1 )
     printf( "%d %s %s\n", counts[f], f, dates[f]);
   }
}
Mark Wilkins
  • 40,729
  • 5
  • 57
  • 110
  • awesome, thanks for the quick reply! it doesnt show the number of different dates, but that's not important. i just added it here. thanks again! – Jonas Tadeu Jan 19 '12 at 14:34
  • @JonasTadeu: Ah! You are right; I didn't notice that in the example. I updated the printf to add it in. – Mark Wilkins Jan 19 '12 at 14:38
  • yeah, I did the same, only i put counts between f and dates.. anyway, its amazing.. it'll be so easy to erase useless files.. thanks again! – Jonas Tadeu Jan 19 '12 at 15:16
1

You can try something like this -

#!/usr/bin/awk -f

NR==FNR{
            a[$3]=$2; b[$2]++;next
       } 

($2 in b){
            printf ("%s %s ", $1,$2);
            for (i in a) 
                if (a[i]==$2) 
                    printf i" "; print ""
          }

Test:

[jaypal:~/Temp] cat file1
1 aaa.c 10/10/2010

1 bbb.h 1/1/2011

3 ccc.c 2/2/2012

1 ccc.c 20/6/2011

1 ddd.c 1/1/2010

1 ddd.c 2/4/1999

1 ddd.c 7/1/2012

1 ddd.c 10/1/1977

[jaypal:~/Temp] cat file2
4 ddd.c

2 ccc.c

3 xxx.c

[jaypal:~/Temp] ./s.awk ff1 ff2
4 ddd.c 10/1/1977 1/1/2010 2/4/1999 7/1/2012 

2 ccc.c 20/6/2011 2/2/2012 
jaypal singh
  • 74,723
  • 23
  • 102
  • 147