So, I've been writing up code to read in a dataset from a file and separate it out for analysis.
The data in question is read from a .dat file, and looks like this:
14 HO2 O3 OH O2 O2
15 HO2 HO2 H2O2 O2
16 H2O2 OH HO2 H2O
17 O O O2
18 O O2 O3
19 O O3 O2 O2
The code I've written looks like this:
edge_data=np.genfromtxt('Early_earth_reaction.dat', dtype = str,
missing_values=True, filling_values=bool)
The plan was that I'd then run the values from the dataset and build a paired list from them.
edge_list=[]
for i in range(360):
edge_list.append((edge_data[i,0],edge_data[i,2]))
edge_list.append((edge_data[i,1],edge_data[i,2]))
print edge_data[i,0]
if edge_data[i,3] != None:
edge_list.append((edge_data[i,0],edge_data[i,3]))
edge_list.append((edge_data[i,1],edge_data[i,3]))
if edge_data[i,4]!= None:
edge_list.append((edge_data[i,0],edge_data[i,4]))
edge_list.append((edge_data[i,1,edge_data[i,4]))
However, upon running it, I get the error message
File "read_early_earth.py", line 52, in main
edge_data=np.genfromtxt('Early_earth_reaction.dat', dtype = str,
usecols=(1,2,3,4,5), missing_values=True, filling_values=bool)
File "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 1667,
in genfromtxt
raise ValueError(errmsg)
ValueError: Some errors were detected !
Line #6 (got 4 columns instead of 5)
Line #14 (got 6 columns instead of 5)
Line #17 (got 4 columns instead of 5)
And so on and so forth. As far as I can tell, this is happening because there are rows where not all the columns have values in them, which apparently throws numpy for a loop.
Is there a work-around for this in numpy? Alternatively, is there another way to accomplish this task? I know, worse comes to worse, I can torture some regular expressions into doing the job, but I'd prefer a method that's a bit more efficient if at all possible.
Thanks!