1

I have saved a google query (title and description) of 100 results. It has this format:

Title                Description
Spain - Wikipedia    Spain is a democracy organised in the form of a parliamentary government under a constitutional monarchy. It is a developed country with the world's fourteenth

You get an idea. I successfully load this CSV file into weka. Apply NominalToString filter first (because it loads in Nominal). And then apply the StringToWordVector with the following options:

IDFTransform - True
TFTTransform - T
normalaize - T
outputWordCounts - T
tokenizer - Alphabetical
WordstoKeep - 100

More or less. I then get a list of words, sometimes I use the NGramTokenizer to have at least 3 words.

After that I go to Cluster and choose K-means. This doesn't works very well as it puts 90% in one cluster . Or maybe it is right....

What does happen when I choose Use training set here as I don't have anything yet? What option should I use? I want to form clusters like in categories(Tourism, Sports, Economy,...). Can Weka do that like Carrot2 does? Or at least form clusters.

Thanks.

EricJ
  • 131
  • 3
  • 13

0 Answers0