When I convert data from a pandas dataframe to sklearn so I can make predictions. String data becomes problematic. So I used labelencoder but it seems to limit me to using the encoded data instead of the source string data.
in predict method of sklearn i want to predict on this input:
learn_to_machine=dtc.fit(X,Y)
test=[
[128, 6 ,50, 'mobile_phone', 'Samsung', 6000],
[512, 8, 65, 'mobile_phone', 'Huawei',5000]
]
answer=learn_to_machine.predict(test)
print(answer[0])
print(answer[1])
# 11399000
# 15304000
rather than this one:
learn_to_machine=dtc.fit(X,Y)
test=[
[128, 6 ,50, 0, 2, 6000],
[512, 8, 65, 0, 3,5000]
]
answer=learn_to_machine.predict(test)
print(answer[0])
print(answer[1])
# 11399000
# 15304000
If it helps, here's all my code:
import sqlalchemy
import pandas as pd
read_engine=sqlalchemy.create_engine('mysql+mysqlconnector://root:@localhost/six')
conn = read_engine.connect()
df_new=pd.read_sql_table('mobile1' ,con= conn )
df_new['price']=df_new['price'].astype(int)
df_new['ram']=df_new['ram'].astype(int)
df_new['battery']=df_new['battery'].astype(int)
df_new['size']=df_new['size'].astype(float)
df_new['camera']=df_new['camera'].mask(df_new['camera'] == '')
df_new['camera']=df_new['camera'].mask(df_new['camera'] == ' ')
df_new['camera']=df_new['camera'].mask(df_new['camera'] == ' ')
df_new['camera']=df_new['camera'].fillna(0)
df_new['camera']=df_new['camera'].astype(float)
X=df_new[['ram','size','camera','product','Brand','battery']].values
Y=df_new[['price']].values
from sklearn import preprocessing
product_enc=preprocessing.LabelEncoder()
product_enc.fit([char for char in X[:,4]])
X[:,4]=product_enc.transform(X[:,4])
product_enc.fit([ char for char in X[:,3]])
X[:,3]=product_enc.transform(X[:,3])
from sklearn import tree
dtc=tree.DecisionTreeClassifier()
learn_to_machine=dtc.fit(X,Y)
# when i execute with this its ok
test=[
[128, 6 ,50, 0, 2, 6000],
[512, 8, 65, 0, 3,5000]
]
answer=learn_to_machine.predict(test)
print(answer[0])
print(answer[1])
# 11399000
# 15304000
when i tried execute tat with this :
test=[
[128, 6 ,50, 'mobile_phone', 'Samsung', 6000],
[512, 8, 65, 'mobile_phone', 'Huawei',5000]
]
this error raised:
ValueError: could not convert string to float: 'mobile_phone'