-2

I am trying to write a code that would calculate average temperature (reducer.py) based on ncdc weather.

0057011060999991928010112004+67500+012067FM-12+001199999V0202001N012319999999N0500001N9+00281+99999102171ADDAY181999GF108991999999999999001001MD1710261+9999MW1801
0062011060999991928010206004+67500+012067FM-12+001199999V0201801N00931220001CN0200001N9+00281+99999100901ADDAA199002091AY121999GF101991999999017501999999MD1810461+9999
0108011060999991928010212004+67500+012067FM-12+001199999V0201601N009319999999N0100001N9+00111+99999100062ADDAY171999GF108991999011012501001001MD1810542+9999MW1681EQDQ01+000042SCOTLCQ02+100063APOSLPQ03+000542APC3  
0087011060999991928010306004+67500+012067FM-12+001199999V0202001N022619999999N0100001N9+00501+99999098781ADDAA199001091AY161999GF108991999011004501001001MD1310061+9999MW1601EQDQ01+000042SCOTLC
0057011060999991928010312004+67500+012067FM-12+001199999V0202301N01541004501CN0040001N9+00001+99999098951ADDAY161999GF108991081061004501999999MD1210201+9999MW1601
#!/usr/bin/env python

import sys

(last_key, max_val) = (None, -sys.maxint)
for line in sys.stdin:
  (key, val) = line.strip().split("\t")
  if last_key and last_key != key:
    print "%s\t%s" % (last_key, max_val)
    (last_key, max_val) = (key, int(val))
  else:
    (last_key, max_val) = (key, max(max_val, int(val)))

if last_key:
  print "%s\t%s" % (last_key, max_val)
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245

1 Answers1

0

First of all, your shown data has no tabs, so it's not clear why you've shown code that splits lines on tabs and finds the max. Not an average.

To find an average, you'll need to collect all seen values into a list (values.append(int(val))), then you can from statistics import mean and call mean(values) at the end of the loop

I'd highly suggest that you use mrjob or pyspark instead

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245