0

My data file is like this:

abb
sdsdfmn
sfdf sdf

2011-12-05 11:00                                         1.0        9.0        
2011-12-05 12:00                                        44.9        2.0        
2011-12-05 13:00                                        66.8        4.2       
2011-12-05 14:00       22.8        1.0       26.2       45.2        2.3      
2011-12-05 15:00       45.7        2.0       45.0       45.6        1.4      
2011-12-05 16:00       23.2        3.0      456.2       11.7        1.5      
2011-12-05 17:00       67.4        4.0      999.1       45.8        0.9  
2011-12-05 18:00                                        34.4        1.2
2011-12-05 19:00       12.4        4.2      345.1       11.1        7.6

I used numpy genfromtxt:

data = np.genfromtxt('data.txt', usecols=(0,1,3), skip_header=4, dtype=[('date','S10'),('hour','S5'),('myfloat','f8')])

The Problem is column 3 has some empty values in there (at the beginning and later on). So it read a wrong column.

I tried the delimiter-parameter, because all float columns has fixed width (delimiter=[10,5,5]), but it also fails. Is there a workaround?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
mcatis
  • 1,176
  • 4
  • 11
  • 24
  • `genfrontxt` isn't as good as `pandas` when handling this sort of missing data. Or you could write your own reader from basic python. That shouldn't be hard, What's the failure with the fixed column definition.? – hpaulj Nov 14 '18 at 16:26
  • When using the fixed width, it refers only to the column where values in it. It doesn't find the gaps. There are basically only spaces between. – mcatis Nov 14 '18 at 16:46
  • It reads and splits a line one at a time. The layout of line 4 does not affect how it handles line 1. – hpaulj Nov 14 '18 at 17:05
  • Let's say we have line 1 : I think it selects "2011-12-05", "11:00" and "9.0". What i need is : "2011-12-05", "11:00" and " ". In line 4 : "2011-12-05", "14:00" and "26.2". That's correct. – mcatis Nov 14 '18 at 17:26
  • 1
    I know what you want. But `genfromtxt` has no way of knowing that the big block of spaces in the middle of line 1 are 'supposed' to contain 3 columns. With the default `whitespace` delimiter, that's just one delimiter, not 3. With the fixed width delimiter, you need to account for all columns, not justed the 'used' ones. – hpaulj Nov 14 '18 at 17:57
  • OK! I found this workaround: Creat a pandas dataframe: df = pandas.read_fwf('data.txt', colspecs='infer', skiprows=4). Work's fine! – mcatis Nov 14 '18 at 18:44

0 Answers0