0

I have spread sheets of climate data for which, essentially, I need to transpose parts of rows into columns and vice versa. Unfortunately, the format is somewhat awkward. The data came to me with columns for year, month, number of days in the month, the type of climate data in the row, and then a successive 93 columns, each representing a daily value, succeeded and preceded by an associated flag (so 3 terms, a value and 2 flags, for each day of the month). Although months vary in length, the shorter months have been filled out with null values in the last few columns. What I want, for processing and modeling purposes, is a spreadsheet/.csv file with columns as follows:

year, month, day of month (i.e. a number 1 to 31), and then five columns representing the type of climate data (precip, snow, snow water, tmax, tmin).

If I could get columns with the appropriate flag values as well, that would be great, but it's not a priority. So, I've written the code below to unpack rows into lists (probably very inefficiently, but I'm new at this) representing year, month, type of climate variable, variable value, flag1 and flag2 based on the location in the row (corresponding to a day, 1 to 31):

import matplotlib.mlab as mlab
from matplotlib.pyplot import figure, show
import numpy as np

import scipy
import csv

durham='C:\\Users\\LocalUser\\Desktop\\Drought Data\\My_Met_Data\\USHCN\\Durham.csv'

txt='met'
station='Durham'

output=station+"_"+txt+"_"+"new"+".csv"

infile=open(durham,'r')
outfile=open(output,'w')
writer=csv.writer(outfile)

yr=[]; mon=[]; var=[]; unit=[]; flag1=   [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
flag2=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
value=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31];\
valu=[]; flg1=[]; flg2=[]; prcp=[]; snow=[]; snwd=[]; tmax=[]; tmin=[]; row=[]
for line in infile:
  stationid, variable, units, year, month, days, flag1[0], value[0], flag2[0], flag1[1], value[1], flag2[1], flag1[2], value[2], flag2[2],\
  flag1[3], value[3], flag2[3], flag1[4], value[4], flag2[4], flag1[5], value[5], flag2[5], flag1[6], value[6], flag2[6],\
  flag1[7], value[7], flag2[7], flag1[8], value[8], flag2[8] ,flag1[9], value[9], flag2[9], flag1[10], value[10], flag2[10],\
  flag1[11], value[11], flag2[11], flag1[12], value[12], flag2[12], flag1[13], value[13], flag2[13], flag1[14], value[14], flag2[14],\
  flag1[15], value[15], flag2[15], flag1[16], value[16], flag2[16], flag1[17], value[17], flag2[17], flag1[18], value[18], flag2[18],\
  flag1[19], value[19], flag2[19], flag1[20], value[20], flag2[20], flag1[21], value[21], flag2[21], flag1[22], value[22], flag2[22],\
  flag1[23], value[23], flag2[23], flag1[24], value[24], flag2[24], flag1[25], value[25], flag2[25], flag1[26], value[26], flag2[26],\
  flag1[27], value[27], flag2[27], flag1[28], value[28], flag2[28], flag1[29], value[29], flag2[29], flag1[30], value[30], flag2[30]=line.split(',')
  yr=[int(year)]
  mon=[int(month)]
  var=variable
  unit=units

  for yr in range(1926, 2003):
     for mon in range(1,13):
        if var=='PRCP':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            prcp.append(valu[j])

        elif var=='SNOW':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            snow.append(valu[j])

        elif var=='SNWD':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            snwd.append(valu[j])

        elif var=='TMAX':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            tmax.append(valu[j])

        elif var=='TMIN':
          valu=[float(i) for i in value]
          flg1=[flag1]
          flg2=[flag2]
          for j in range(31):
            tmin.append(valu[j])

            row=[yr, mon, j+1, prcp[j], snow[j], snwd[j], tmax[j], tmin[j]]
            writer.writerow(row)


infile.close()
outfile.close()

Now, leaving aside that I get a memory error when I run this, if I take away a few of the climate variables, then I get a do successfully get a .csv file in the format that I want. The problem is, that every single month, in every single year (1926-2002), reports the same climate data values-- that is, the data for the month of january 1926. The code is calling the data from the appropriate variable for the appropriate day, but repeating the same data month after month. I'm not sure where I've gone wrong with this, but any suggestions/help would be much appreciated.

hdevs
  • 63
  • 1
  • 1
  • 4
  • Can you show an example of the input? Also, is the code you posted here properly indented? I see a few things in odd places... – Avaris Oct 17 '11 at 17:58

1 Answers1

1

Each for makes a loop; your code has two loops for years inside each other – it processes all the years with the first line of the file, then all the years for the second, etc. This is the error you're running into, but if you just go and fix it somehow, another one will pop up soon enough.


Now, please do borrow a good Python book at the library and spend some time reading and doing the exercises. Or join a course. Get a knowledgeable friend to review your code for you. StackOverflow might help you to get through a specific problem, but sadly it can't teach you the concepts. You're on the wrong path; there's nothing but trouble ahead if you just continue like this. You should go back and learn the basics a bit better, it'll make it much easier in the long run.

Computers are there to do tedious and repetitive tasks for you. You should never type out a giant list of numbers or numbered variables. Get familiar with lists (and lists of lists), and the range function.

Use descriptive names for your variables, not abbreviations. This is Python, we like things clear. And put each statement on its own line; all those semicolons look ugly. These things are important if you want to share code, get help, or just organize your own thoughts on it.

Study the documentation for the csv module and use its reader also, not just the writer.

Get familiar with list slicing, particularly the line[1::3] variety.

Learn the with statement for files.

If you do the same thing in each if/elif, move it out of there to a common place.

Here's to you becoming a good programmer one day :)

Petr Viktorin
  • 65,510
  • 9
  • 81
  • 81