-2
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

cigdata = pd.read_csv ('cigs.csv')
print(cigdata.head())
cigdata.shape
#Changing pandas dataframe to numpy array
X = cigdata.iloc[:,:8].values
y = cigdata.iloc[:,8:9].values
#Normalizing the data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

I am trying to normalize my data for implementation of naïve Bayes algorithm but it is not working as I have string values in my dataset and it's showing something like this:

valuetype error string value cannot be converted to float.

double-beep
  • 5,031
  • 17
  • 33
  • 41
  • Ok, so the problem seems to be, you are reading some data from a CSV which however is interpreted by python as string (probably because it is wrapped by "" in the csv. Could you however provide more concrete error messages? "its showing something like this valuetype error string value cannot be converted to float" is not very useful for narrowing down the possible cause of an error. – mandulaj Jan 03 '21 at 08:15
  • X = sc.fit_transform(X) on this line its giving me error sir as i have some columns in my dataset which contain some string values so its giving me error that string value cannot be converted to float and naming it as valuetype error – Salman Mehfooz Jan 03 '21 at 08:23
  • Could you also provide the `cigs.csv` If its large, at least a sample of it. – mandulaj Jan 03 '21 at 08:27
  • Alpine Lt 15 1.1 16 100 F SP yes – Salman Mehfooz Jan 03 '21 at 08:34
  • these above values are of the 1st row and it contain total 9 columns – Salman Mehfooz Jan 03 '21 at 08:39
  • Sir can you give me your email address i would be able to share file with you and you can check error easily – Salman Mehfooz Jan 03 '21 at 08:42
  • So in your csv, you have some data, But as you can see the first, second and 7-9th columns are not numeric, they are a string of text. This is what `StandardScaler` is complaining about. You have to convert them into a numeric value (for example yes=1, no=0 etc..) Check https://pbpython.com/categorical-encoding.html – mandulaj Jan 03 '21 at 08:47
  • this conversion process is very lengthy sir is their any other way i mean any function which can be use for string values? – Salman Mehfooz Jan 03 '21 at 09:09
  • There is an answer provided for converting string categories to numbers bellow. If you have specific issues related to that answer, please discuss it in the comments here. If you are facing a different issue, please ask a new question. – mandulaj Jan 03 '21 at 12:01

1 Answers1

0

You can use LabelEncoder from sklearn

In your case, the categorical columns are index [0, 1, 6, 7, 8] You can Encode them using this code:

from sklearn.preprocessing import LabelEncoder 

# ....

cat_index = [0, 1, 6, 7, 8]
cigdata.iloc[:, cat_index] = cigdata.iloc[:, cat_index].apply(LabelEncoder().fit_transform)

Place this after the read_csv and you should be left with a data frame containing only numerical values.

mandulaj
  • 733
  • 3
  • 10