27

I'm due to take up a project which is into data mining. Before I jump in I wanted to probe around for different data mining tools (preferably open source) which allows web based reporting. In my scenario the data would be provided to me, so I'm not supposed to crawl for it.

In a nutshell, I am looking for a tool which does - Data Analysis, Web based Reporting, provides some kind of a dashboard and mining features.

I have worked on the Microsoft Analysis Services and BOXI and off late I have been looking at Pentaho, which seems to be a good option.

Please share your experiences on any such tool which you know of.

cheers

Quamis
  • 10,924
  • 12
  • 50
  • 66
Arnkrishn
  • 29,828
  • 40
  • 114
  • 128

20 Answers20

12

I believe WEKA is the best open source DM software out there.

Check it: http://www.cs.waikato.ac.nz/ml/weka/

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
9

Weka is great, but you might want to try the Orange Data Mining toolkit instead.

http://www.ailab.si/orange/

Edit: And as of November 2010, I must say I really like KNIME.

ybakos
  • 8,152
  • 7
  • 46
  • 74
  • 2
    +1 for KNIME. I discovered this a few weeks ago, and have been very impressed with what it can do. Supports Java, Python, and R scripts, and the BIRT add-on makes writing reports a breeze. – Patrick Cuff Jan 20 '11 at 12:08
5

R has a lot of excellent packages related to data mining. In particular, look at:

It also ties into Weka (see the RWeka package). And it can be integrated with either .Net (through COM) or Python (through RPy or RPy2).

I would agree regarding Pentaho for a reporting platform, although it's a very large project depending upon what you're using it for.

Shane
  • 98,550
  • 35
  • 224
  • 217
5

You should also check out Apache Mahout . It can be quite useful for some large-scale machine learning tasks such as user clustering.

random.bit
  • 59
  • 1
  • 1
  • The Apache licence is the biggest plus, because other mentioned libraries use GPL that prohibits commercial use cases – TomR Apr 08 '15 at 20:42
5

RapidMiner is my preferred data mining tool.

Andrei Sfat
  • 8,440
  • 5
  • 49
  • 69
Trevor Kemmer
  • 59
  • 1
  • 1
3

I would try with the new google tools.

-first you need to get the api id for the google-storage, which is where you are going to store and manipulate the data you are going to analyze.

-Then you need to get the api id for google-prediction-api (http://code.google.com/apis/predict/docs/getting-started.html), which for what I saw it is a fantastic outsourced data mining processor. The Prediction API allows you to get more from your data and makes its patterns more accessible. Besides using traditional numeric and nominal data you can also use text data that thanks to this api can be utilized for exampled to categorize emails by language.

-Finally you can use bigQuery that will allow you to perform Ad-hoc analysis, Standardized reporting, Data exploration App prototyping (http://code.google.com/apis/bigquery/)

mariana soffer
  • 1,853
  • 12
  • 17
3

KEEL (http://keel.es) is written in Java and is good for using evolutionary computation for data mining.

aliassaila
  • 83
  • 1
  • 4
2

Have a look at list of Open Source software's for Machine learning maintained by JMLR. you can find it here:

http://mloss.org/software/

http://jmlr.csail.mit.edu/mloss/

They represent State of Art!

My issue with Weka is that a number of algorithms in it are outdated.

2

i believe RapidMiner is an excellent tool that should be added to this list.

mariana soffer
  • 1,853
  • 12
  • 17
2

WEKA (Already mentioned), Orange (http://orange.biolab.si/), Tanagra (http://data-mining-tutorials.blogspot.com) you can find good tutorials there.

Are very good tools for data mining.

codious
  • 3,377
  • 6
  • 25
  • 46
2

You could check my software, the SPMF data mining framework.

It is an open-source Java software that offers more than 70 algorithms for:

  • frequent itemset mining,
  • association rule mining,
  • sequential pattern mining
  • sequential rule mining.
  • and more..
Phil
  • 3,375
  • 3
  • 30
  • 46
1

Pentaho is a very professional solution. Definitely a very good choice.

Pascal Thivent
  • 562,542
  • 136
  • 1,062
  • 1,124
1

You can look at Data Mining SDK and its blog.

sashaeve
  • 9,387
  • 10
  • 48
  • 61
1

A list of some open source data mining tools are listed here: http://dataminingtools.net/browse.php

Datakid
  • 11
  • 1
1

Eclipse BIRT http://www.eclipse.org/birt/phoenix/project/description.php

crowne
  • 8,456
  • 3
  • 35
  • 50
1

I believe KNIME deserves to join this list as well.

radek
  • 7,240
  • 8
  • 58
  • 83
1

Weka is strong for classification and /machine learning/. To many, this is considered to be more a part of artificial intelligence than of actual data-mining. RapidMiner is largely along the same lines, but with a much nicer UI. Pentaho is the professional support for Weka AFAICT.

You might want to have a look at ELKI, http://elki.dbs.ifi.lmu.de/ which is a comparable project that focuses on clustering algorithms and outlier detections, two other key tasks of data-mining.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
0

you can take a look at data mining tool, weka

Here is a link to a collection of tutorials and videos on WEKA Tutorials:http://www.dataminingtools.net/browsetutorials.php?tag=weka 

Videos: http://www.dataminingtools.net/videos.php?id=6 

0

Along with the tools, i would strongly suggest learning Python and R. These languages help a lot during analysis. Also, large datasets can be 'custom-analysed'. You might also create your own custom dashboard using Javascript(check out the numerous charting and visualization libraries)

sprezzatura
  • 472
  • 5
  • 17
-1

I am a python-er myself and I have to say:

Yes! All of that can be done in Python.

I last played around with Beautiful Soup[0]. It's a really simple to use module that lets you grab/mine data from html and xml (excellent for 'screen scraping').

If you dont know python, .... well It's really easy to learn.

[0]http://www.crummy.com/software/BeautifulSoup/

machinaut
  • 495
  • 2
  • 4
  • 17