-1

i'm trying to create an array which has 5 columns imported from a data file. The 4 of them are floats and the last one string.

The data file looks like this:

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa

I tried these:

data = np.genfromtxt(filename, dtype = "float,float,float,float,str", delimiter = ",")

data = np.loadtxt(filename, dtype = "float,float,float,float,str", delimiter = ",")

,but both codes import only the first column.

Why? How can i fix this?

Ty for your time! :)

crystal
  • 103
  • 6

1 Answers1

1

You must specify correctly the str type : "U20" for exemple for 20 characters max :

data = np.loadtxt('data.txt', dtype = "float,"*4 + "U20", delimiter = ",")

seems to work :

array([( 5.1,  3.5,  1.4,  0.2, 'Iris-setosa'),
       ( 4.9,  3. ,  1.4,  0.2, 'Iris-setosa'),
       ( 4.7,  3.2,  1.3,  0.2, 'Iris-setosa'),
       ( 4.6,  3.1,  1.5,  0.2, 'Iris-setosa'),
       ( 5. ,  3.6,  1.4,  0.2, 'Iris-setosa'),
       ( 5.4,  3.9,  1.7,  0.4, 'Iris-setosa'),
       ( 4.6,  3.4,  1.4,  0.3, 'Iris-setosa'),
       ( 5. ,  3.4,  1.5,  0.2, 'Iris-setosa')],
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<U20')])

An other method using pandas give you an object array, but this slow down further computations :

In [336]: pd.read_csv('data.txt',header=None).values
Out[336]: 
array([[5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
       [4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
       [4.7, 3.2, 1.3, 0.2, 'Iris-setosa'],
       [4.6, 3.1, 1.5, 0.2, 'Iris-setosa'],
       [5.0, 3.6, 1.4, 0.2, 'Iris-setosa'],
       [5.4, 3.9, 1.7, 0.4, 'Iris-setosa'],
       [4.6, 3.4, 1.4, 0.3, 'Iris-setosa'],
       [5.0, 3.4, 1.5, 0.2, 'Iris-setosa']], dtype=object)
B. M.
  • 18,243
  • 2
  • 35
  • 54