0

I have a program that needs to process a CSV file. This file needs to be converted into a dataset. The example that I am working with comes from the popular python tutorial with the iris data set. I am trying to replace datasets.load_iris() with a method to read the CSV 'A1-md.csv'.

Expected:

The program process the CSV file and loads the data.

Actual:

Traceback (most recent call last):
  File ".\example.py", line 38, in <module>
    main()
  File ".\example.py", line 11, in main
    dataset = np.loadtxt(fname = 'A1-dm.csv', delimiter = ',')
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 1134, in loadtxt
    for x in read_data(_loadtxt_chunksize):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 1061, in read_data
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 1061, in <listcomp>
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 768, in floatconv
    return float(x)
ValueError: could not convert string to float: 'A1'

The code for this implementation is

from sklearn import datasets
from sklearn.model_selection import train_test_split
from MDLP import MDLP_Discretizer

def main():

    ######### USE-CASE EXAMPLE #############

    #read dataset
    dataset = np.loadtxt(fname = 'A1-dm.csv', delimiter = ',')
    X, y = dataset['A1'], dataset['Class']
    # feature_names, class_names = dataset['feature_names'], dataset['target_names']
    # numeric_features = np.arange(X.shape[1])  # all fetures in this dataset are numeric. These will be discretized

    # #Split between training and test
    # X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

    # #Initialize discretizer object and fit to training data
    # discretizer = MDLP_Discretizer(features=numeric_features)
    # discretizer.fit(X_train, y_train)
    # X_train_discretized = discretizer.transform(X_train)

    # #apply same discretization to test set
    # X_test_discretized = discretizer.transform(X_test)

    # #Print a slice of original and discretized data
    # print('Original dataset:\n%s' % str(X_train[0:5]))
    # print('Discretized dataset:\n%s' % str(X_train_discretized[0:5]))

    # #see how feature 0 was discretized
    # print('Feature: %s' % feature_names[0])
    # print('Interval cut-points: %s' % str(discretizer._cuts[0]))
    # print('Bin descriptions: %s' % str(discretizer._bin_descriptions[0]))

if __name__ == '__main__':
    main()

A sample of the CSV file is:

A1,A2,A3,Class
2,0.4631338,1.5,3
8,0.7460648,3.0,3
6,0.264391038,2.5,2
5,0.4406713,2.3,1
2,0.410438159,1.5,3
2,0.302901816,1.5,2
6,0.275869396,2.5,3
8,0.084782428,3.0,3
2,0.53226533,1.5,2

How can I solve this?

halfer
  • 19,824
  • 17
  • 99
  • 186
Evan Gertis
  • 1,796
  • 2
  • 25
  • 59
  • Does this answer your question? [How to load csv file as a dataset for processing with sklearn](https://stackoverflow.com/questions/69629164/how-to-load-csv-file-as-a-dataset-for-processing-with-sklearn) – Patrick Oct 19 '21 at 10:32

1 Answers1

0

The first line of your CSV file is an header that displays text. You should skip this line in order to operate string to float conversion.

Please check this out: numpy loadtxt skip first row

Martin Tovmassian
  • 1,010
  • 1
  • 10
  • 19