-5

i am new to kmeans clustering method. i try to cluster a 1 dimension string array data in python.

Below is my data:

expertise=['
Bioactive Surfaces and Scaffolds for Regenerative Medicine', 
'Drug/gene delivery science',
'RNA nanomedicine', 'Immuno/bio/nano-engineering', 'Biomaterials', 'Nanomedicine',
'Biobased Chemicals and Polymers',
'Membranes Science & Technology', 
'Modeling of Infectious and Lifestyle-related Diseases']

km = KMeans(n_clusters=2)
km.fit(expertise)

and i get ValueError: could not convert string to float:

so i wonder how to apply kmeans on string data or is there any way i can change the data to two dimension?

tttthomasssss
  • 5,852
  • 3
  • 32
  • 41
AAron
  • 1
  • 1
  • 2
  • 3
    What a cluster of strings is supposed to mean ? – polku Aug 09 '16 at 13:24
  • i have tried coordinate data on kmeans and it work perfectly. so i wonder is string data work or not – AAron Aug 09 '16 at 13:30
  • Well precisely this is not 'string data', but just strings. It's certainly possible to make 'clusters of strings' if you find a way to get data from them, (like using the hamming distance or something like that) but sklearn can't do that for you, NLTK maybe has that kind of thing. – polku Aug 09 '16 at 13:43
  • Possible duplicate of [Clustering text documents using scikit-learn kmeans in Python](http://stackoverflow.com/questions/27889873/clustering-text-documents-using-scikit-learn-kmeans-in-python) – Has QUIT--Anony-Mousse Aug 09 '16 at 19:29

2 Answers2

0

you will first have to define how you wanna cluster your data. The scikit-learn's simple KMeans clustering is designed to work on numbers. However scikit-learn can be also be used to cluster documents by topics using a bag-of-words approach. This is done by extracting the features using scipy.sparse matrix instead of standard numpy arrays

One of the demo example is given here: http://scikit-learn.org/stable/auto_examples/text/document_clustering.html

Nik391
  • 517
  • 2
  • 7
  • 24
0

There is almost no sense in what you are trying to do. How do you think two clustered groups should look like?

If you can't plot data you won't be able to cluster it. Find a way to present strings in some numerical way (for example length, occurrence of letters depending on what you want to get) and then cluster this numerical data.

askorek
  • 126
  • 1
  • 7