1

I am trying to create a script that uses wget to download a data set and then awk to sort though the file and tell you the most common filter used which is $14 column. So far I have the wget function working as seen below,

wget -O- http://energy.gov/sites/prod/files/FieldSampleAirResults_0.csv 

But then would I pipe that to an awk script or should I try to do it all in one script? Also, I know how you would check for common words, it would be something like

$14=="charcoal" {++charcoal} 

but I am not sure how to implement this in an awk script. Any advice or help would be greatly appreciated.

Thanks, kevin

growse
  • 8,020
  • 13
  • 74
  • 115
kevin jack
  • 11
  • 2

2 Answers2

3

This prints the type of filter that occurs most.

wget -O- http://energy.gov/sites/prod/files/FieldSampleAirResults_0.csv | awk -F, '
    {
        filters[$14]++
    }
    END {
        for (filter in filters) {
            if (filters[filter] > max) {
                max = filters[filter]
                type = filter
            }
        }
        print type
    }'

You can easily print each of the types and their counts, if you prefer. AWK can do the sorting, if needed, or you can use the external sort utility.

Dennis Williamson
  • 62,149
  • 16
  • 116
  • 151
2

I would use uniq to handle the counting:

wget -O- http://energy.gov/sites/prod/files/FieldSampleAirResults_0.csv | cut -d, -f14 | sort | uniq -c

Note that this isn't going to handle quoted fields containing a comma correctly. If you need to handle that you need something which actually understands the CSV format, like Python's csv module:

python -c 'import csv; import sys; [sys.stdout.write(row[14]+"\n") for row in csv.reader(sys.stdin)]'
mgorven
  • 30,615
  • 7
  • 79
  • 122