dtype=None
tells genfromtxt
to guess the appropriate dtype.
From the docs:
dtype: dtype, optional
Data type of the resulting array. If None, the dtypes will be
determined by the contents of each column, individually.
(my emphasis.)
Since your data is comma-separated, be sure to include delimiter=','
or else np.genfromtxt
will interpret each column (execpt the last) as including a string character (the comma) and therefore mistakenly assign a string dtype to each of those columns.
For example:
import numpy as np
arr = np.genfromtxt('data', dtype=None, delimiter=',')
print(arr.dtype)
# [('f0', '<f8'), ('f1', 'S4'), ('f2', '<i4'), ('f3', '<f8'), ('f4', '<f8')]
This shows the names and dtypes of each column. For example, ('f3', <f8)
means the fourth column has name 'f3'
and is of dtype '<i4. The i
means it is an integer dtype. If you need the third column to be a float dtype then there are a few options.
- You could manually edit the data by adding a decimal point in the
third column to force genfromtxt to interpret values in that column
to be of a float dtype.
You could supply the dtype explicitly in the call to genfromtxt
arr = np.genfromtxt(
'data', delimiter=',',
dtype=[('f0', '<f8'), ('f1', 'S4'), ('f2', '<f4'), ('f3', '<f8'), ('f4', '<f8')])
print(arr)
# [(999.9, ' abc', 34, 78.0, 12.3) (1.3, ' ghf', 12, 8.4, 23.7)
# (101.7, ' evf', 89, 2.4, 11.3)]
print(arr['f2'])
# [34 12 89]
The error message IndexError: invalid index
is being generated by the line
ionenergy = y[:,0]
When you have mixed dtypes, np.genfromtxt
returns a structured array. You need to read up on structured arrays because the syntax for accessing columns differs from the syntax used for plain arrays of homogenous dtype.
Instead of y[:, 0]
, to access the first column of the structured array y
, use
y['f0']
Or, better yet, supply the names
parameter in np.genfromtxt
, so you can use a more relevant column name, like y['ionenergy']
:
import numpy as np
arr = np.genfromtxt(
'data', delimiter=',', dtype=None,
names=['ionenergy', 'foo', 'bar', 'baz', 'quux', 'corge'])
print(arr['ionenergy'])
# [ 999.9 1.3 101.7]