Using wget and Awk to count similar expressions

Question

I am trying to create a script that uses wget to download a data set and then awk to sort though the file and tell you the most common filter used which is $14 column. So far I have the wget function working as seen below,

wget -O- http://energy.gov/sites/prod/files/FieldSampleAirResults_0.csv

But then would I pipe that to an awk script or should I try to do it all in one script? Also, I know how you would check for common words, it would be something like

$14=="charcoal" {++charcoal}

but I am not sure how to implement this in an awk script. Any advice or help would be greatly appreciated.

Thanks, kevin

Why does this belong on Server Fault? I have voted to move it to Stack Overflow. — Dennis Williamson, Apr 26 '12 at 02:51

score 3 · Answer 1 · answered Apr 26 '12 at 02:50

This prints the type of filter that occurs most.

wget -O- http://energy.gov/sites/prod/files/FieldSampleAirResults_0.csv | awk -F, '
    {
        filters[$14]++
    }
    END {
        for (filter in filters) {
            if (filters[filter] > max) {
                max = filters[filter]
                type = filter
            }
        }
        print type
    }'

You can easily print each of the types and their counts, if you prefer. AWK can do the sorting, if needed, or you can use the external sort utility.

mgorven · Answer 2 · 2012-04-25T19:03:22.580

2

I would use uniq to handle the counting:

wget -O- http://energy.gov/sites/prod/files/FieldSampleAirResults_0.csv | cut -d, -f14 | sort | uniq -c

Note that this isn't going to handle quoted fields containing a comma correctly. If you need to handle that you need something which actually understands the CSV format, like Python's csv module:

python -c 'import csv; import sys; [sys.stdout.write(row[14]+"\n") for row in csv.reader(sys.stdin)]'

edited Apr 25 '12 at 19:03

answered Apr 25 '12 at 16:59

mgorven

30,615
7
79
122

thank you so much for the help! another question, how would i change the field separator to commas? – kevin jack Apr 25 '12 at 18:10
use `awk -F,` instead of just awk. btw, you can also change output delimiter via OFS variable inside of awk. – rush Apr 25 '12 at 18:12
cut is probably a simpler tool in this case (updated answer). – mgorven Apr 25 '12 at 19:03

Using wget and Awk to count similar expressions

2 Answers2