I have spread sheets of climate data for which, essentially, I need to transpose parts of rows into columns and vice versa. Unfortunately, the format is somewhat awkward. The data came to me with columns for year, month, number of days in the month, the type of climate data in the row, and then a successive 93 columns, each representing a daily value, succeeded and preceded by an associated flag (so 3 terms, a value and 2 flags, for each day of the month). Although months vary in length, the shorter months have been filled out with null values in the last few columns. What I want, for processing and modeling purposes, is a spreadsheet/.csv file with columns as follows:
year, month, day of month (i.e. a number 1 to 31), and then five columns representing the type of climate data (precip, snow, snow water, tmax, tmin).
If I could get columns with the appropriate flag values as well, that would be great, but it's not a priority. So, I've written the code below to unpack rows into lists (probably very inefficiently, but I'm new at this) representing year, month, type of climate variable, variable value, flag1 and flag2 based on the location in the row (corresponding to a day, 1 to 31):
import matplotlib.mlab as mlab
from matplotlib.pyplot import figure, show
import numpy as np
import scipy
import csv
durham='C:\\Users\\LocalUser\\Desktop\\Drought Data\\My_Met_Data\\USHCN\\Durham.csv'
txt='met'
station='Durham'
output=station+"_"+txt+"_"+"new"+".csv"
infile=open(durham,'r')
outfile=open(output,'w')
writer=csv.writer(outfile)
yr=[]; mon=[]; var=[]; unit=[]; flag1= [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
flag2=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
value=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
valu=[]; flg1=[]; flg2=[]; prcp=[]; snow=[]; snwd=[]; tmax=[]; tmin=[]; row=[]
for line in infile:
stationid, variable, units, year, month, days, flag1[0], value[0], flag2[0], flag1[1], value[1], flag2[1], flag1[2], value[2], flag2[2],\
flag1[3], value[3], flag2[3], flag1[4], value[4], flag2[4], flag1[5], value[5], flag2[5], flag1[6], value[6], flag2[6],\
flag1[7], value[7], flag2[7], flag1[8], value[8], flag2[8] ,flag1[9], value[9], flag2[9], flag1[10], value[10], flag2[10],\
flag1[11], value[11], flag2[11], flag1[12], value[12], flag2[12], flag1[13], value[13], flag2[13], flag1[14], value[14], flag2[14],\
flag1[15], value[15], flag2[15], flag1[16], value[16], flag2[16], flag1[17], value[17], flag2[17], flag1[18], value[18], flag2[18],\
flag1[19], value[19], flag2[19], flag1[20], value[20], flag2[20], flag1[21], value[21], flag2[21], flag1[22], value[22], flag2[22],\
flag1[23], value[23], flag2[23], flag1[24], value[24], flag2[24], flag1[25], value[25], flag2[25], flag1[26], value[26], flag2[26],\
flag1[27], value[27], flag2[27], flag1[28], value[28], flag2[28], flag1[29], value[29], flag2[29], flag1[30], value[30], flag2[30]=line.split(',')
yr=[int(year)]
mon=[int(month)]
var=variable
unit=units
for yr in range(1926, 2003):
for mon in range(1,13):
if var=='PRCP':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
prcp.append(valu[j])
elif var=='SNOW':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
snow.append(valu[j])
elif var=='SNWD':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
snwd.append(valu[j])
elif var=='TMAX':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
tmax.append(valu[j])
elif var=='TMIN':
valu=[float(i) for i in value]
flg1=[flag1]
flg2=[flag2]
for j in range(31):
tmin.append(valu[j])
row=[yr, mon, j+1, prcp[j], snow[j], snwd[j], tmax[j], tmin[j]]
writer.writerow(row)
infile.close()
outfile.close()
Now, leaving aside that I get a memory error when I run this, if I take away a few of the climate variables, then I get a do successfully get a .csv file in the format that I want. The problem is, that every single month, in every single year (1926-2002), reports the same climate data values-- that is, the data for the month of january 1926. The code is calling the data from the appropriate variable for the appropriate day, but repeating the same data month after month. I'm not sure where I've gone wrong with this, but any suggestions/help would be much appreciated.