-1

I'm a beginner in data science and I need your help I'm trying to test unsupervised machine learning with the K-means but I found that the result is not spherical. I normalized, I removed the outliers etc. I tried to find several way to correct it but it doesn't work

Here are pictures: (I took a little sample of the dataset to show you, it's actually 8000 rows)

enter image description here ... enter image description here

Armali
  • 18,255
  • 14
  • 57
  • 171
Thao Ly
  • 33
  • 6
  • 1
    What are you plotting? what are x-axis and y-axis?. Provide some piece of code, please – lsmor Jan 24 '19 at 11:44

2 Answers2

2
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

pca = PCA(n_components=2)

principalComponents = pca.fit_transform(df)

principalDf = pd.DataFrame(data = principalComponents, columns = ['principal component 1', 'principal component 2'])

principalDf.head(5)

I used the PCA to reduce the 6 dimensions to 2 : It separates the data perfectly

Output: Output

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
Thao Ly
  • 33
  • 6
1

Your data have 6 dimensions. You can't visualize data above 2 dimension in a straight forward manner, you need to use PCA or TSNE to visualize them.

Bhaskar Dhariyal
  • 1,343
  • 2
  • 13
  • 31