Read file with missing data with loadtxt (numpy)

Question

When I tried to read the data below with:

loadtxt('RSTN')

I got an error.
Then I tried to complete this missing data using:

genfromtxt('RSTN',delimiter=' ')

But I got this error:

Line #31112 (got 7 columns instead of 8)

I'd like to fill the missing data with nan, or something similiar.

I have data like this in an ascii file named RSTN:

 20120127165126     19     42     54     91    113    147    188    284
 20120127165127     19     42     54     91    113    147    188    284
 20120127165128     19     42     54     90    113    147    188    284
 20120127165129     19     42     54     90    113    147    188    284
 20120127165130     19     42     54     88    107    131    155    235
 20120127165131     19     42     54     72     79     79     92    154
 20120127165132     19     42     54     45     43     42     50     97
 20120127165133     19     42     54     24     21     21     25     65
 20120127165134     19     42     54     11      8      9     12     46
 20120127165135     19     42     54      5      2      3      7     35
 20120127165136     18     42     54      2      0      1      4     29
 20120127165137     19     42     54      0             0      2     25
 20120127165138     19     42     53      0             0      1     22
 20120127165139     19     42     54      0             0      1     19
 20120127165140     19     42     54      0             0      0     17
 20120127165141     19     42     54      0             0      0     14
 20120127165142     19     42     54      0             0      0     14
 20120127165143     19     42     54      0             0      0     14
 20120127165144     19     42     54      0                    0     13
 20120127165145     19     42     54      0                    0     14
 20120127165146     19     42     54      0             0      0     14
 20120127165147     19     42     54      0             0      1     15
 20120127165148     19     42     54      0             0      1     15
 20120127165149     19     42     54      0             0      1     15
 20120127165150     20     42     53      0                    1     15
 20120127165151     20     42     53      0                    1     17
 20120127165152     20     42     53      0                    1     17
 20120127165153     19     42     53      0             0      1     17
 20120127165154     20     42     53      0                    1     17
 20120127165155     20     42     53      0                    1     17
 20120127165156     20     42     53      0             0      1     17
 20120127165157     19     42     54      0             0      1     17
 20120127165158     19     42     55      0             0      1     17
 20120127165159     19     42     55      0             0      1     17
 20120127165200     20     42     56      0             0      1     17
 20120127165201     21     42     56      0             0      1     17

When I did this:

from pandas import *
data=read_fwf('26JAN12.K7O', colspecs='infer', header=None)

I got this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 429, in read_fwf
    return _read(filepath_or_buffer, kwds)
  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 198, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 479, in __init__
    self._make_engine(self.engine)
  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 592, in _make_engine
    self._engine = klass(self.f, **self.options)
  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1954, in __init__
    PythonParser.__init__(self, f, **kwds)
  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1237, in __init__
    self._make_reader(f)
  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1957, in _make_reader
    self.data = FixedWidthReader(f, self.colspecs, self.delimiter)
  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1933, in __init__
    raise AssertionError()
AssertionError

unutbu · Accepted Answer · 2014-03-11T21:53:37.360

1

If you have pandas you could parse it with pd.read_fwf:

import pandas as pd
df = pd.read_fwf('data', colspecs='infer', header=None, parse_dates=[[0]])
print(df)

yields

                     0   1   2   3   4    5    6    7    8
0  2012-01-27 16:51:26  19  42  54  91  113  147  188  284
1  2012-01-27 16:51:27  19  42  54  91  113  147  188  284
...
11 2012-01-27 16:51:37  19  42  54   0  NaN    0    2   25
12 2012-01-27 16:51:38  19  42  53   0  NaN    0    1   22
13 2012-01-27 16:51:39  19  42  54   0  NaN    0    1   19

[36 rows x 9 columns]

Or, thanks to DSM, using np.genfromtxt you can parse fixed-width data by passing a list of widths to the delimiter parameter:

import numpy as np
np.set_printoptions(formatter={'float':'{:g}'.format})
arr = np.genfromtxt('data', delimiter=[18]+[7]*8)
print(arr)

yields

[[2.01201e+13 19 42 54 91 113 147 188 284]
 [2.01201e+13 19 42 54 91 113 147 188 284]
 [2.01201e+13 19 42 54 90 113 147 188 284]
...
 [2.01201e+13 19 42 54 0 nan 0 2 25]
 [2.01201e+13 19 42 53 0 nan 0 1 22]
 [2.01201e+13 19 42 54 0 nan 0 1 19]
...]

edited Mar 11 '14 at 21:53

answered Mar 11 '14 at 21:11

unutbu

842,883
184
1,785
1,677

When I do that got this error: Traceback (most recent call last): File "", line 1, in File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 429, in read_fwf return _read(filepath_or_buffer, kwds) It would be good if i could convert the missing data into NAN – nandhos Mar 11 '14 at 21:19
Could you post the full traceback in your original post? Its hard to read in the comments and seems to be lacking the exception line. – unutbu Mar 11 '14 at 21:21
That's it, but actually the file complete have many rows (around 36000, 1 value per second) – nandhos Mar 11 '14 at 21:33
Mild dissent from the rejection of `genfromtxt` as an option; `delimiter` accepts a list of widths, e.g. `np.genfromtxt("RSTN", delimiter=(18,)+(7,)*7)`. And I might have tossed a `parse_dates=[0]` in there just to show off on the `pandas` side. :^) – DSM Mar 11 '14 at 21:34
@DSM: Oh my, I never realized np.genfromtxt could parse fixed-width data. Thanks for the correction! – unutbu Mar 11 '14 at 21:35
1

@nandhos: Do you get an error when you use DSM's suggestion: `np.genfromtxt('RSTN', delimiter=[18]+[7]*8)`? – unutbu Mar 11 '14 at 21:43
@nandhos: It appears you have a (relatively) old version of pandas installed. The `colspec='infer'` parameter was introduced in pandas version `v.0.13`. In that case, try: `pd.read_fwf('data', colspecs=[18]+[7]*8, header=None, parse_dates=[[0]])`. – unutbu Mar 11 '14 at 21:48
@unutbu, It's work but coud you explain me please waht mean 'delimiter=[18]+[7]*8'? – nandhos Mar 11 '14 at 21:58
1

It means that the first column is 18 characters wide, and the next 8 columns are 7 characters wide. Note that `[18]+[7]*8` is just shorthand for `[18, 7, 7, 7, 7, 7, 7, 7, 7]`. – unutbu Mar 11 '14 at 22:10

score 0 · Answer 2 · answered Apr 20 '21 at 23:39

0

I had a similar problem, reading from a tab-separated file with missing data. If you can get your data in a tab-separated format, something like this would work:

import pandas as pd

df = pd.read_csv('RSTN', sep='\t', header = None)

answered Apr 20 '21 at 23:39

Stefano Maffei

21
2

Read file with missing data with loadtxt (numpy)

2 Answers2