0

I'd like to replace the iris data by my own data. please tell me what are the steps to follow to do that ? thanks

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import sklearn 
from sklearn.cluster import KMeans 
from mpl_toolkits.mplot3d import Axes3D 
from sklearn.preprocessing import scale 
import sklearn.metrics as sm 
from sklearn import datasets 
from sklearn.metrics import confusion_matrix,classification_report                              import matplotlib.pyplot as plt plt.rc('figure', figsize=(7,4)) 
iris = datasets.load_iris() 
X = scale(iris.data) 
Y = pd.DataFrame(iris.target) 
variable_name = iris.feature_names X[0:10,] 
clustering = KMeans(n_clusters=3,random_state=5) 
clustering.fit(X) 
iris_df = pd.DataFrame(iris.data) 
iris_df.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width'] Y.columns = ['Targets']
kary
  • 1

2 Answers2

0
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import sklearn 
from sklearn.cluster import KMeans 
from mpl_toolkits.mplot3d import Axes3D 
from sklearn.preprocessing import scale 
import sklearn.metrics as sm 
from sklearn import datasets 
from sklearn.metrics import confusion_matrix,classification_report   

# CHANGED CODE START
df = pd.read_excel('tmp.xlsx') 
Y = df['target']
X = df.drop(['target'], axis=1)
# CHANGED CODE END

variable_name = X.columns 
clustering = KMeans(n_clusters=3,random_state=5) 
clustering.fit(X) 
user2672299
  • 414
  • 2
  • 12
0

The import section will stay the same.

Lets assume you have a dataframe:

#read your dataframe(several types possible)
df = pd.read_csv('test.csv')
#you need to define a target variable (named target in my case) and the features X
Y = df['target']
X = df.drop(['target'], axis=1)
#here your k-means algorithm gets start
clustering = KMeans(n_clusters=3,random_state=5) 
clustering.fit(X)

let me add one more think, for what are you using kmeans? it is an unsupervised learning method, so you do not have any target variable, so what are you doing? Normally it should be:

df = pd.read_csv('test.csv')
#columns header you want to use
relevant_columns = ['A', 'B']
X = df[relevant_columns]
clustering = KMeans(n_clusters=3,random_state=5) 
clustering.fit(X)
PV8
  • 5,799
  • 7
  • 43
  • 87
  • So you basically just steal my code and present as answer. – user2672299 Aug 28 '19 at 12:21
  • your code is not working...the part with `iris..` is wrong and I added a second part to it... – PV8 Aug 28 '19 at 12:22
  • You just removed the line that did not work and for the second paragraph you copied the code and removed more lines. – user2672299 Aug 28 '19 at 12:24
  • the code doesn't work. I got this error : pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 11, saw 3 – kary Aug 28 '19 at 13:52
  • the code of the dataframe depends on your data , you have to adjust `pd.read_csv`, https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html – PV8 Aug 29 '19 at 05:31