My goal is to analyze a ndarray created by the function sklearn.train_test_split. Actually they are two ndarry of this type:
N_sample.dtype.name
and it returns 'object' and the same for
R_sample.dtype.name
Let me explain how I obtained those data. I copied all text from here and pasted in an empty file named iris.cvs file. I saved it in the same folder of my project. So, I wrote my python script:
import pandas
from sklearn import model_selection
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = pandas.read_csv('iris.cvs', names=names)
array = dataset.values
N = array[:,0:4]
R = array[:,4]
N_sample, N_test, R_sample, R_test = model_selection.train_test_split(N, R, test_size=0.2, random_state=7)
Therefore, I have:
N_sample.shape
that returns (120,4) and
R_sample.shape
that returns (120,)
So to create the dataset I used this:
new_arr=numpy.column_stack((N_sample,R_sample))
dateN=pandas.DataFrame(data =new_arr, columns=names)
#names was created before with the correct matches
The problem is that if I ask:
dateN.describe()
It returns count, unique, top, etc... but I want mean, std, etc... I tried different methods, like casting data of N_sample but it doesn't work, like:
pandas.to_numeric(dateN,downcast='float', errors='ignore')
but it's not possible because it needs just list, etc...Or I used this method:
N_sample.astype(float,casting='unsafe')
but finally it doesn't change the result.
Moreover, if I do:
dateN.iloc[:,0:4] = dateN.iloc[:,0:4].apply(pandas.to_numeric, errors='coerce')
dateN.dtypes
it returns:
sepal-length object
sepal-width object
petal-length object
petal-width object
class object
dtype: object
So nothing's changed. How could I solve? How to make the dataset numeric so that I can get mean std, etc...?