1

I am using the read_table command in pandas/Python to import a tab-delimited text file.

q_data_1 = pd.read_table('data.txt', skiprows=6, dtype={'numbers': np.float64})

...but get

AttributeError: 'NoneType' object has no attribute 'dtype'

Without the dtype parameter, the column is imported as an 'object' dtype.

The 'numbers' column I think has missing data which trips up the import. How do I ignore these values?

EDIT (25-May-13): Any idea how to do this with columns that contain (i) time (e.g. '00:03:06') (ii) date (e.g. '2002-03-11') and percentages ('32.81%')? All of which convert to objects. (I have edited Q to reflect) (iv) numbers with commas (e.g. '10,982') to convert them to appropriate dtype?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user7289
  • 32,560
  • 28
  • 71
  • 88

1 Answers1

1

After you've read in the DataFrame (without restricting dtype) you can then convert it (using technique from this post) with apply:

import locale
locale.setlocale( locale.LC_ALL, 'en_US.UTF-8')
df = pd.DataFrame([['1,002.01'], ['300,000,000.1'], ['10']], columns=['numbers'])

In [4]: df['numbers']
Out[4]:
0         1,002.01
1    300,000,000.1
2               10
Name: numbers, dtype: object

In [5]: df['numbers'].apply(locale.atof)
Out[5]:
0    1.002010e+03
1    3.000000e+08
2    1.000000e+01
Name: numbers, dtype: float64

In[6]: df['numbers'] = df['numbers'].apply(locale.atof)
Community
  • 1
  • 1
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • BTW any idea how to do the same thing with a columns that contain (i) time (e.g. '00:03:06') (ii) date (e.g. '2002-03-11') and percentages ('32.81%')? All of which convert to objects. (I have edited Q to reflect) – user7289 May 25 '13 at 10:59
  • You shouldn't edit the question, but rather ask a new one :). It's essentially the same trick in both cases (just define a function which does it to a single string and then apply it the column). – Andy Hayden May 25 '13 at 11:15
  • Okay will do. Is this the efficient way of doing it? As I am using Pandas because it efficiently handles large data sets using essentially C-libraries. – user7289 May 25 '13 at 18:29
  • 1
    It's a good question, but certainly my advice is use this one and see if it is efficient enough, I think it will be reasonably efficient. There is a converters argument to read_csv which could be worth investigating... – Andy Hayden May 25 '13 at 18:51
  • Thanks for this appreciate you're help. Might be worth asking as a separate question ;) – user7289 May 25 '13 at 18:56
  • Heavy use of the `%timeit` helper is always recommended :) – Andy Hayden May 25 '13 at 18:59