Questions tagged [weka]

Weka (Waikato Environment for Knowledge Analysis) is an open source machine learning library written in Java.

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

Weka is open source software issued under the GNU General Public License.

Weka's main user interface is the Explorer, but essentially the same functionality can be accessed through the component-based Knowledge Flow interface and from the command line. There is also the Experimenter, which allows the systematic comparison of the predictive performance of Weka's machine learning algorithms on a collection of datasets.

The Explorer interface features several panels providing access to the main components of the workbench:

  • The Preprocess panel has facilities for importing data from a database, a CSV file, etc., and for preprocessing this data using a so-called filtering algorithm. These filters can be used to transform the data (e.g., turning numeric attributes into discrete ones) and make it possible to delete instances and attributes according to specific criteria.
  • The Classify panel enables the user to apply classification and regression algorithms (indiscriminately called classifiers in Weka) to the resulting dataset, to estimate the accuracy of the resulting predictive model, and to visualize erroneous predictions, ROC curves, etc., or the model itself (if the model is amenable to visualization like, e.g., a decision tree).
  • The Associate panel provides access to association rule learners that attempt to identify all important interrelationships between attributes in the data.
  • The Cluster panel gives access to the clustering techniques in Weka, e.g., the simple k-means algorithm. There is also an implementation of the expectation maximization algorithm for learning a mixture of normal distributions.
  • The Select attributes panel provides algorithms for identifying the most predictive attributes in a dataset.
  • The Visualize panel shows a scatter plot matrix, where individual scatter plots can be selected and enlarged, and analyzed further using various selection operators.

Online Resources:

Use Weka in your Java Code

Weka on Sourceforge

Weka on GitHub

3033 questions
11
votes
3 answers

Weka Predictions to CSV

I've trained a classifier in Weka, and I'm able to use it on test data. Additionally, I can opt to display the classifier's predictions in the log window for this test data. However, for my current project, it would be convenient for me to be able…
elliottbolzan
  • 1,057
  • 1
  • 15
  • 30
10
votes
4 answers

How to add LibSVM class to WEKA classpath on a Mac

I am running Max OS X 10.7 Lion and I want to use WEKA with LibSVM from command line. I get this error: Problem evaluating classifier: libsvm classes not in CLASSPATH! I found the LibSVM library here. I need to add it to my Java classpath so that…
Dan
  • 4,488
  • 5
  • 48
  • 75
10
votes
4 answers

K-means with really large matrix

I have to perform a k-means clustering on a really huge matrix (about 300.000x100.000 values which is more than 100Gb). I want to know if I can use R software to perform this or weka. My computer is a multiprocessor with 8Gb of ram and hundreds Gb…
Delphine
  • 1,113
  • 5
  • 15
  • 22
10
votes
3 answers

What is Class Index in WEKA?

I have to use WEKA in my java code for prediction. Basically I have to study a given code and reuse it. testdata.setClassIndex(data.numAttributes() - 1); I am unable to understand what the above line means. What is a Class Index? testdata and data…
GiriB
  • 1,244
  • 2
  • 13
  • 28
10
votes
3 answers

Weka CSVLoader wrong number of values. Read 2, expected 23

I am trying to convert a CSV to ARFF using Weka's CSVLoader from the GUI. In the options I set the enclosure character for strings to be ", although there are no quotes in my file. I get the following error: weka.core.converters.CSVLoaderfailed to…
fiacobelli
  • 1,960
  • 5
  • 24
  • 31
10
votes
3 answers

Weka's PCA is taking too long to run

I am trying to use Weka for feature selection using PCA algorithm. My original feature space contains ~9000 attributes, in 2700 samples. I tried to reduce dimensionality of the data using the following code: AttributeSelection selector = new…
amit
  • 175,853
  • 27
  • 231
  • 333
10
votes
2 answers

Weka: Results of each fold in 10-fold CV

For Weka Explorer (GUI), when we do a 10-fold CV for any given ARFF file, then what Weka Explorer provides (as far as I can see) is the average result for all the 10 folds. Q. Is there any way to get the results of each fold? For instance, I need…
Rushdi Shams
  • 2,423
  • 19
  • 31
9
votes
3 answers

How to cluster an instance with Weka's DBSCAN?

I've been trying to use the DBSCAN clusterer from Weka to cluster instances. From what I understand I should be using the clusterInstance() method for this, but to my surprise, when taking a look at the code of that method, it looks like the…
Oak
  • 26,231
  • 8
  • 93
  • 152
9
votes
1 answer

How to get the nearest neighbor in weka using java

I've been trying to use the Ibk nearest neighbor algorithm that goes together with the weka machine learning library. I know how to classify instances, but I want to implement the collaborative filtering feature so I need to actually get the list of…
kamikaze_pilot
  • 14,304
  • 35
  • 111
  • 171
9
votes
2 answers

ARFF for natural language processing

I'm trying to take a set of reviews, and convert them into the ARFF format for use with WEKA. Unfortunately either I completely misunderstand how the format works, or I'll have to have an attribute for ALL possible words, then a presence indicator.…
Dean Barnes
  • 2,252
  • 4
  • 29
  • 53
9
votes
4 answers

Which datamining tool to use?

Can somebody explain me the main pros and cons of the most known datamining open-source tools? Everywhere I read that RapidMiner, Weka, Orange, KNIME are the best ones. look at this blog post Can somebody do a fast technical comparison in a small…
user2670818
  • 719
  • 5
  • 12
  • 28
9
votes
1 answer

How to calculate the nearest neighbors using weka from the command line?

I have a csv file, where each row is a vector of numbers representing a data point. I want to use weka from the command line to calculate the nearest neighbor of each data point in the csv file. I know how to do k nearest neighbor classification…
Mike Izbicki
  • 6,286
  • 1
  • 23
  • 53
9
votes
3 answers

Can TF/IDF take classes in account

Using a classsication algorythm (for example naive bayes or SVM), and StringToWordVector, would it be possible to use TF/IDF and to count terms frequency in the whole current class instead of just looking in a single document? Let me explain, I…
Loic
  • 3,310
  • 4
  • 25
  • 43
9
votes
1 answer

Why does the C4.5 algorithm use pruning in order to reduce the decision tree and how does pruning affect the predicion accuracy?

I have searched on google about this issue and I can't find something that explains this algorithm in a simple yet detailed way. For instance, I know the id3 algorithm doesn't use pruning at all, so if you have a continuous characteristic, the…
ksm001
  • 3,772
  • 10
  • 36
  • 57
8
votes
1 answer

How do I use a JSON file with weka

I have a JSON file and want to open the data in weka, but when I do, I get the following error: Looking around on the mailing list, there are a few questions about JSON, but TL;DR except that I noticed talk of JSON in the "format weka expects". Of…
Pat
  • 16,515
  • 15
  • 95
  • 114