3

I am using java encog machine learning library for running kmeans clustering. The problem is that it is possible only on numeric data. Is there a way to vectorize the text file(data) so that I can directly feed into the kmeans clustering algorithm.

I am new to this space. Any help is appreciated.

jh314
  • 27,144
  • 16
  • 62
  • 82
lost Coder
  • 577
  • 2
  • 8
  • 34
  • Could be better suited for http://datascience.stackexchange.com/ . However, you have to explain what your entities are, and what distance measure you want to use. If you have several texts, and want to find out a similarity between the texts, one common practice (as far as I know) is to collect ALL words in ALL texts (let this be 100 words) and then create one 100-dimensional vector for each text. This vector has a "1" for the words that are in the text, and a "0" for all other entries. – Marco13 May 17 '15 at 00:34

1 Answers1

1

try using Object Out put Stream as follow

ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject("YOUR_TEXT");
oos.close();
Alaa Abuzaghleh
  • 1,023
  • 6
  • 11