0

I am trying to read data from a csv file. there are 7 columns. Column indexed 5 is a string type while rest of the columns are floats.

When I give following command just to read the float data the output is in proper format.

data = np.loadtxt('data.csv', delimiter=',', usecols= (0,1,2,3,4,6))
print "\ndata=\n",data

Output is

data=
 [[  3.00000000e+00   9.46000000e+01   1.80180000e+02   3.28900000e+01
    6.80685824e+00   3.70000000e-01]
...,
 [  3.00000000e+00   1.33200000e+02   2.51460000e+02   2.01600000e+01
    5.77236048e+00  -2.70000000e-01]]

with shape of (500L, 6L)

But then when I try to read all the columns including the column number 5 which is string type I use following code:

    datastr = np.loadtxt('data.csv', delimiter=',',dtype={'names': ('c1','c2','c3','c4','c5','c6','c7'),
                                                                            'formats':('f4','f4','f4','f4','f4','S10','f4')})
print "\ndatastr=\n",datastr

Now the output is

datastr= 
[ ( 3.,   94.59999847,  180.17999268,  32.88999939,   6.80685806, 'Large',  0.37      ) ... ( 3.,  133.19999695,  251.46000671,  20.15999985,   5.77236032, 'Small', -0.27000001)]

with a shape of (500L,)

But I need to get this datastr shape to (500L,7L) just like in the all float example I had shape of (500L,6L)

How do I do this?

Thanks

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Look at the `dtype` of that `datastr`. You've created a 1d structured array with 7 fields. That's the only way you can hold a mix of float and string 'columns'. You access fields by name, `datastr['c3']`. If you don't like that mix, consider loading the file twice, once to get the 6 float columns, and once to get the string one. You'll get the same data, but in 2 arrays. – hpaulj Jul 05 '17 at 04:11
  • Thanks. I want to use the data in adaboost classifier. For the cross_val_score I need to specify input dataset. When all the data is in one datastructure i just specify 'clf = AdaBoostClassifier(n_estimators=100) scores = cross_val_score(clf, data, target_final) c=scores.mean() '. But with two datasets how do I supply it to cross_val_score. As you can see I am novice. – Confused Jul 05 '17 at 17:06
  • What kind of array can the classifier take? Can it take a structured array, or must it be a 2d array with uniform dtype (e.g. all floats)? How does it handle a mix of string and floats? – hpaulj Jul 05 '17 at 17:26

1 Answers1

0

The elements in datastr is <type 'numpy.void'>,you can find some information about it at here.

hxysayhi
  • 1,888
  • 18
  • 25