I have many csv files which are "column" oriented and that I need to pre-process to finally index them.
This is time oriented data, with a very large number of columns for each "device" (up to 128 columns) like:
LDEV_XXXXXX.csv
Serial number : XXXXX(VSP)
From : 2014/06/04 05:58
To : 2014/06/05 05:58
sampling rate : 1
"No.","time","00:30:00X(X2497-1)","00:30:01X(X2498-1)","00:30:02X(X2499-1)"
"242","2014/06/04 10:00",0,0,0
"243","2014/06/04 10:01",0,0,0
"244","2014/06/04 10:02",9,0,0
"245","2014/06/04 10:03",0,0,0
"246","2014/06/04 10:04",0,0,0
"247","2014/06/04 10:05",0,0,0
My goal is to transpose (if it the term is the right one) data into rows, such that i will be able to manipulate the data much more efficiently, such as:
"time",device,value
"2014/06/04 10:00","00:30:00X(X2497-1)",0
"2014/06/04 10:00","00:30:01X(X2498-1)",0
"2014/06/04 10:00","00:30:02X(X2499-1)",0
"2014/06/04 10:01","00:30:00X(X2497-1)",0
"2014/06/04 10:01","00:30:01X(X2498-1)",0
"2014/06/04 10:01","00:30:02X(X2499-1)",0
"2014/06/04 10:02","00:30:00X(X2497-1)",9
"2014/06/04 10:02","00:30:01X(X2498-1)",0
"2014/06/04 10:02","00:30:02X(X2499-1)",0
And so on...
Note: I have let the raw data (which is uses "," as a separator), you would note that I need to delete the 6 first lines the "No" column which has no interest, but this is not the main goal and difficulty)
I have a python starting code to transpose csv data, but it doesn't exactly what i need...
import csv
import sys
infile = sys.argv[1]
outfile = sys.argv[2]
with open(infile) as f:
reader = csv.reader(f)
cols = []
for row in reader:
cols.append(row)
with open(outfile, 'wb') as f:
writer = csv.writer(f)
for i in range(len(max(cols, key=len))):
writer.writerow([(c[i] if i<len(c) else '') for c in cols])
Note the number of columns are arbitrary, something a few, and up to 128 depending on files.
I'm pretty sure this is a common need but I couldn't yet find the exact python code that does this, or I couldn't get...
Edit:
More precision:
Each timestamp row will be repeated by the number of devices, so that the file will have much more lines (multiplied by the number of devices) but only a few rows (timestamp,device,value) The final desired result has been updated :-)
Edit:
I would like to be able to use the script using argument1 for infile and argument2 for outfile :-)