2

Why would someone use arff? And please give a sample code to read arff file and make use of it in java.

I found the following snippet of code in weka site:

BufferedReader reader =
new BufferedReader(new FileReader("/some/where/file.arff"));
ArffReader arff = new ArffReader(reader);
Instances data = arff.getData();
data.setClassIndex(data.numAttributes() - 1);

what after that? Can someone explain what's going on above? How can I access my data from file? And the weka site mentions two different usages namely, batch and incremental. What's the difference between the two?

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
raghu
  • 131
  • 3
  • 13

2 Answers2

1

Well, usually someone would use arff because it's a very simple file format, basically a csv file with a header describing the data and it's the usual way to save/read data using Weka.

The sample code to read arff file is exactly the one you provided, if you want to make use of the instances loaded you should work with your data. To print them: System.out.println(data); You could check a lot of examples on how to work with the data (classification, clustering, etc) here.

The code you are using loads the arff file in a standard BufferedReader, then creates an ArffReader instance (arff) which reads the data completely from the reader, after that you use the getData method to return the data in a Instances object (called data). Finally, you set which attribute is the class (the last one in your arff file).

If you want to iterate the Instances object and retrieve each instance:

for (int i = 0; i <= data.numInstances - 1; i++) {
    Instance instance = data.getInstance(i);
    System.out.println(instance.stringValue(0)); //get Attribute 0 as String
}

You are talking about batch and incremental read from an arff file. Batch mode reads the arff file completely and the incremental mode gives you the chance to read each instance (line) of the arff file and add it manually.

Code for incremental mode:

 BufferedReader reader =
   new BufferedReader(new FileReader("/some/where/file.arff"));
 ArffReader arff = new ArffReader(reader, 1000);
 Instances data = arff.getStructure();
 data.setClassIndex(data.numAttributes() - 1);
 Instance inst;
 while ((inst = arff.readInstance(data)) != null) {
   data.add(inst);
 }
Leandro Carracedo
  • 7,233
  • 2
  • 44
  • 50
1

I found it very difficult to built a reader because of the lack of examples and the very ambiguous javadoc. So here is a small piece of code that I wrote to read a relation of relation of numerical attributes that have been tested and works !

BufferedReader reader = new BufferedReader(new FileReader(new File(path)));
ArffReader arff = new ArffReader(reader, 1000);         
Instances data = arff.getStructure();
data.setClassIndex(0);

Instance inst;
while ((inst = arff.readInstance(data)) != null) {          
    // the first attribute is ignored because it is the index
    for(int i = 1 ; i < inst.numAttributes() ; i++) {
        switch(inst.attribute(index).type()) {
        case Attribute.NUMERIC :
            System.out.println(inst.value(index));
        case Attribute.STRING :
            System.out.println(inst.stringValue(index));
        case Attribute.RELATIONAL : 
            // test if we have an imbrication of two relations or not
            if (inst.attribute(index).relation().numAttributes() > 0 &&
                    inst.attribute(index).relation().attribute(0).isRelationValued()) {
                    inst.attribute(index).relation().attribute(0).isRelationValued()) {
                // case of an array of int arrays
                double[][] seq = new double[inst.attribute(index).relation().numAttributes()][];
                for (int i = 0 ; i < inst.attribute(index).relation().numAttributes() ; i++) {
                    Instances instances = inst.relationalValue(index);
                    seq[i] = new double[instances.attribute(0).relation().numAttributes()];

                    Instance q = instances.instance(0).relationalValue(i).get(0);
                    for(int j = 0 ; j < instances.attribute(0).relation().numAttributes() ; j++) {
                        seq[i][j] = q.value(j);

                    }
                }
                System.out.println(seq);
            } else {
                // case wit only an arry of int
                double[] seq = new double[inst.attribute(index).relation().numAttributes()];
                for (int i = 0 ; i < inst.attribute(index).relation().numAttributes() ; i++) {
                        seq[i] = inst.value(i);
                }
                System.out.println(seq);
            }
        }
    }               

    System.out.println("index is : "+((int) inst.value(0)));
}

Here is what the data looks like, each element is composed of an index, and a pair of numerical triplets :

@relation 'name of relation'

@attribute index numeric
@attribute attr1 relational
@attribute attr1.0 relational
@attribute attr1.0.0 numeric
@attribute attr1.0.1 numeric
@attribute attr1.0.2 numeric
@end attr1.0
@attribute attr1.1 relational
@attribute attr1.1.0 numeric
@attribute attr1.1.1 numeric
@attribute attr1.1.2 numeric
@end attr1.1
@end attr1

@data
0,'\'23,25,48\',\'12,0,21\''
115260,'\'34,44,72\',\'15,8,32\''
230520,'\'175,247,244\',\'107,185,239\''
345780,'\'396,269,218\',\'414,276,228\''
461040,'\'197,38,42\',\'227,40,43\''

Hope it might help someone

BaptisteL
  • 490
  • 6
  • 15
  • Yes it is, but only on my specific case, it's not a general approach. It is just here to give an example on how to read through the arff object. It might not handle some specific cases. – BaptisteL Sep 19 '17 at 08:26