I would like to know how i can apply this clustering algorithm on my own data please?

Question

I'd like to replace the iris data by my own data. please tell me what are the steps to follow to do that ? thanks

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import sklearn 
from sklearn.cluster import KMeans 
from mpl_toolkits.mplot3d import Axes3D 
from sklearn.preprocessing import scale 
import sklearn.metrics as sm 
from sklearn import datasets 
from sklearn.metrics import confusion_matrix,classification_report                              import matplotlib.pyplot as plt plt.rc('figure', figsize=(7,4)) 
iris = datasets.load_iris() 
X = scale(iris.data) 
Y = pd.DataFrame(iris.target) 
variable_name = iris.feature_names X[0:10,] 
clustering = KMeans(n_clusters=3,random_state=5) 
clustering.fit(X) 
iris_df = pd.DataFrame(iris.data) 
iris_df.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width'] Y.columns = ['Targets']

A sample of your data and its lable would help us to give you the answers. — idkman, Aug 28 '19 at 11:52

user2672299 · Answer 1 · 2019-08-28T12:19:59.307

0

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import sklearn 
from sklearn.cluster import KMeans 
from mpl_toolkits.mplot3d import Axes3D 
from sklearn.preprocessing import scale 
import sklearn.metrics as sm 
from sklearn import datasets 
from sklearn.metrics import confusion_matrix,classification_report   

# CHANGED CODE START
df = pd.read_excel('tmp.xlsx') 
Y = df['target']
X = df.drop(['target'], axis=1)
# CHANGED CODE END

variable_name = X.columns 
clustering = KMeans(n_clusters=3,random_state=5) 
clustering.fit(X)

edited Aug 28 '19 at 12:19

answered Aug 28 '19 at 12:12

user2672299

414
2
12

does not work , because of ´variable_name = iris.feature_names X[0:10,]` – PV8 Aug 28 '19 at 12:18
I fixed this line. – user2672299 Aug 28 '19 at 12:22

score 0 · Answer 2 · answered Aug 28 '19 at 12:17

0

The import section will stay the same.

Lets assume you have a dataframe:

#read your dataframe(several types possible)
df = pd.read_csv('test.csv')
#you need to define a target variable (named target in my case) and the features X
Y = df['target']
X = df.drop(['target'], axis=1)
#here your k-means algorithm gets start
clustering = KMeans(n_clusters=3,random_state=5) 
clustering.fit(X)

let me add one more think, for what are you using kmeans? it is an unsupervised learning method, so you do not have any target variable, so what are you doing? Normally it should be:

df = pd.read_csv('test.csv')
#columns header you want to use
relevant_columns = ['A', 'B']
X = df[relevant_columns]
clustering = KMeans(n_clusters=3,random_state=5) 
clustering.fit(X)

answered Aug 28 '19 at 12:17

PV8

5,799
7
43
87

So you basically just steal my code and present as answer. – user2672299 Aug 28 '19 at 12:21
your code is not working...the part with `iris..` is wrong and I added a second part to it... – PV8 Aug 28 '19 at 12:22
You just removed the line that did not work and for the second paragraph you copied the code and removed more lines. – user2672299 Aug 28 '19 at 12:24
the code doesn't work. I got this error : pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 11, saw 3 – kary Aug 28 '19 at 13:52
the code of the dataframe depends on your data , you have to adjust `pd.read_csv`, https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html – PV8 Aug 29 '19 at 05:31

I would like to know how i can apply this clustering algorithm on my own data please?

2 Answers2