0

I have a data set consisting from 6-dimensional data points. I want to produce a self-organizing map for this data to see how my data is clustered and how many different clusters are there in my dataset. My dataset is UNLABELED. And all the examples that I came across are all labelled(iris dataset). I have used various python packages(minisom, sompy, susi) to implement SOM but I am unable to visualize and interpret those results.

I would request this community to help me with this and I would really appreciate if you can provide a link to good work on >3 dimensional data based on SOM-clustering with proper evaluation of results.

MORE INFO:::::::::::

Thanks. I was able to understand the UMATRIX. However, I am still struggling to cluster similar datapoints.

This is a sample of dataset:

A      B          C        D            E            F
1   0.000613    150386  20.279685   39400220.0  0.672270
1   0.000649    154428  21.069894   8444300.0   0.466464
1   0.000276    154017  20.890017   12361590.0  0.399357
1   0.000186    68675   20.419599   13973180.0  0.430975
1   0.000177    60795   23.276564   5686630.0   0.372155

This is the result of the of the SOM clustering :

A      B             C      D          E       F      Cluster-id
5   1.096415e-07    274 12.599589   4870.0  0.000060    19
5   1.185185e-07    205 12.108413   10000.0 0.000402    19
5   1.131892e-07    221 12.282051   290.0   0.000014    19
5   1.447471e-07    338 12.708078   1750.0  0.000027    19
5   8.218939e-08    244 12.000000   30.0    0.000027    19
   ...  ... ... ... ... ... ... ... ...
5   2.425165e-08    26  12.517500   2020.0  0.000025    19
5   2.926305e-08    51  12.051724   2320.0  0.000012    19
5   2.326685e-08    18  11.724138   290.0   0.000009    19
5   2.465502e-08    18  12.288000   2500.0  0.000018    19
5   5.118597e-08    80  11.776271   2950.0  0.000093    19

If you look at the above result attribute C and attribute E are varying significantly as compared to other attributes even though they belong to the same cluster What is the plausible reason behind this?

and How can I solve this with the aim to have a cluster with similar data points?????(FYI: I did standard scaling on the dataset to equalize the variance of each attribute)

1 Answers1

0

With susi, this works like the following (taken from susi/SOMClustering.ipynb):

import susi
som = susi.SOMClustering()
som.fit(X) # <- X is your dataset without labels

# to get the clusters
clusters = som.get_clusters(X)

# to plot the clusters
plt.scatter(x=[c[1] for c in clusters], y=[c[0] for c in clusters], c=y, alpha=0.2)
plt.gca().invert_yaxis()
plt.show()

Does that work for you? If not, please give us more information about your data.

Disclaimer: I am the developer of susi.

felice
  • 1,185
  • 1
  • 13
  • 27