I'm trying to create a CSV reader which only includes the data with readings in all columns 6,7 and 8.
My data is of rainfall over days of the year. With my code there is an exception however that some of the data is recorded over a few days. The amount of days the data is recorded over is indicated in row[6]
, leaving the previous days with blanks in columns 6, 7 and 8 even though they are complete.
So for the reader I need to make a counter which checks firstly whether the data is complete (no blanks) or if it has blanks and is either part of another reading (recorded over a few days) or incomplete (no readings) what i have done so far is shown here:
datalist = []
def read_complete_data():
''' Reads the file'''
filename = input("Enter file name:") #File must be in the same folder as the directory
with open(filename, 'r') as fileobj:
#open file for reading
reader = csv.reader(fileobj, delimiter = ',')
next(reader)
tempList = []
for row in reader:
if row[5] == "" and row[6] == "" and row[7] == "" :
tempList.append(row)
#Checks if the row is complete
elif row[5] != "" and row[6] != "" and row[7] != "":
numDay = int(row[6])
while numDay > 1:
datalist.append(tempList[1-numDay])
numDay -= 1
Example of the data:
Product code, Station number, Year, Month, Day, Rainfall, Period, Quality IDCJAC0009, 70247, 1988, 12, 21, 0, , Y IDCJAC0009, 70247, 1988, 12, 22, 0, , N IDCJAC0009, 70247, 1988, 12, 23, 0.2, 1, Y IDCJAC0009, 70247, 1988, 12, 24, 0.4, 1, Y IDCJAC0009, 70247, 1988, 12, 25, , Y IDCJAC0009, 70247, 1988, 12, 26, 34.8, 2, Y IDCJAC0009, 70247, 1988, 12, 27, 30.8, 1, N
As seen above, the first two data samples are incomplete as there is no period of which they are measured over. It can be seen that the data sample on line 5 is incomplete, however the following sample has a period measured of 2 meaning that line 5 is in fact complete it is just measured over a 2 day span rather than a single day. This is an example for 2 day measured but there are larger examples where up to 5 days are grouped into one measurement. The last column is the quality of the data and whether it was a quality check. It needs to be Y to be complete data. As i added row 1 and 2 are still incomplete. Row 7 however is now incomplete.
Output: Basically what i am trying to achieve is the CSV file to be read through and the incomplete data lines to be removed from datalist. Using this temporary list i was trying to make datalist full of only the complete data sets.
Wanted output:
Product code, Station number, Year, Month, Day, Rainfall, Period, Quality IDCJAC0009, 70247, 1988, 12, 23, 0.2, 1, Y IDCJAC0009, 70247, 1988, 12, 24, 0.4, 1, Y IDCJAC0009, 70247, 1988, 12, 25, , Y IDCJAC0009, 70247, 1988, 12, 26, 34.8, 2, Y
The next(reader) line is used since the top line of the data contains titles rather than actual data. I'm thinking the problem is arising for me with how ive written the for loop and while loop below using a temporary list which then copies back into the main list (called datalist). There could possibly be a line of code i am missing that is needed for it to work.
I know this is probably a very confusing question and might be tough to answer as the data is not given here but any help with what might be wrong within my code and reading CSV files is greatly appreciated. I thought i would put the question up here even with it being quite confusing to explain. Thanks