21

I am looking for a light weight clustering library in java. I don't need 100s of clustering algo in that library just 5 to 7 algo would be fine for me.

I am sure, you are going to ask: "what kind of algo do you need and for what purpose" :). I just need to do classification of my data with the help of clustering. For example K means.

P.S: I know about weka but I don't want to use it as it is not specifically for clustering only.

Matthieu
  • 2,736
  • 4
  • 57
  • 87
user238384
  • 2,396
  • 10
  • 35
  • 36
  • 1
    what do you mean with clustering? Is weka sth. for you? – Karussell Jan 24 '10 at 23:00
  • Sorry, I didn't get your answer. and what is sth? – user238384 Jan 24 '10 at 23:06
  • Note for future reference... in software development, "clustering" usually means http://en.wikipedia.org/wiki/Cluster_%28computing%29 – skaffman Jan 24 '10 at 23:11
  • 9
    @skaffman not really, see data mining or http://en.wikipedia.org/wiki/Cluster_analysis – Karussell Jan 24 '10 at 23:40
  • @agazerboy sth. means something. for weka take a look here http://en.wikipedia.org/wiki/Weka_%28machine_learning%29 So, do you mean the data mining technic? ah, okay I read your edit ... – Karussell Jan 24 '10 at 23:42
  • @Karussell: I said "usually". – skaffman Jan 25 '10 at 07:58
  • @skaffman That's entirely dependent on the context. And cluster analysis, "usually" refers to data: http://en.wikipedia.org/wiki/Cluster_analysis – sdasdadas Jun 04 '13 at 18:37
  • what's with all the downvotes on this question? – lynxoid Sep 22 '16 at 20:20
  • @lynxoid Im not sure, but i think every answer seemed to get multiple downvotes within a few days. Pretty odd. – Binary Nerd Oct 03 '16 at 10:22
  • @lynxoid the question is clearly **off-topic**, and the answers are largely **link-only answers** (which is a reason to downvote, some links here are even dead): http://stackoverflow.com/help/deleted-answers – Has QUIT--Anony-Mousse Jan 25 '17 at 14:21
  • Links were not dead 6 years ago. A link to a "lightweight clustering library" is a reasonable answer to the question. Good luck to SO w/ their purge of old Q&As. – lynxoid Jan 25 '17 at 18:19

8 Answers8

6

Take a look at org.apache.commons.math4.ml.clustering.KMeansPlusPlusClusterer in Apache's Commons Math library.

Mark
  • 1,788
  • 1
  • 22
  • 21
0

If Scala also works for you, then you might want to check this version of KMeans in Scala:

https://github.com/wspringer/kmeans

A related blog post is here:

http://nxt.flotsam.nl/k-means-clustering.html

Wilfred Springer
  • 10,869
  • 4
  • 55
  • 69
0

If you want some basic clustering algorithms in Java, you can check my software:

http://www.philippe-fournier-viger.com/spmf/

It offers an implementation of KMeans and a hierarchical clustering algorithm.

The other algorithms offered are for pattern mining. Totally, there are 47 algorithms. But only 2 for clustering. Another thing: there is a simple GUI for launching the algorithms.

Phil
  • 3,375
  • 3
  • 30
  • 46
0

I would take a look at JUNG. It has a number of clustering algorithms implemented, although I'm not sure if K-means is one of them.

Another option might be to take a look at Knime, an Eclipse based workflow editor. This includes a number of clustering primitives you can use as part of a workflow, including K-means.

Binary Nerd
  • 13,872
  • 4
  • 42
  • 44
  • For those interested, JUNG has k-means clustering: http://jung.sourceforge.net/doc/api/edu/uci/ics/jung/algorithms/util/KMeansClusterer.html – sdasdadas Jun 04 '13 at 18:34
0

There are some open-source clustering algorithms in Java available here, available under the GPL. Requires the Java Colt library (for matrices). http://open.trickl.com/

Tim Gee
  • 1,062
  • 7
  • 9
0

There is also ELKI, an open-source university project similar to WEKA, but with the focus on cluster analysis and outlier detection instead of machine learning algorithms. It's pretty advanced, uses index structures for efficiency, and has at least a dozen clustering algorithms.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
-1

Apache Mahout implements many clustering algorithms, via Hadoop. It's a little heavy for what you want, but: http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html

Also you might be able to dig out and adapt the user clustering code from Mahout's TreeClusteringRecommender class, which uses clustering for recommender engine purposes.

Sean Owen
  • 66,182
  • 23
  • 141
  • 173
-1

Cytoscape software has several plugins that implement clustering algorithms for networks and numerical data (Nemo, MCODE, clusterMaker, and so on). All plugins are open-source.

lynxoid
  • 509
  • 6
  • 14