0

i want to calculate the normalized euclidean distance between two vectors with length of 5. The simpler way with Apache Math and RealVector does not normalize the distance so I try to use Weka. I have followed java code:

Attribute one = new Attribute("one");
Attribute two = new Attribute("two");
Attribute three = new Attribute("three");
Attribute four = new Attribute("four");
Attribute five = new Attribute("five");

FastVector attributes = new FastVector();
attributes.addElement(one);
attributes.addElement(two);
attributes.addElement(three);
attributes.addElement(four);
attributes.addElement(five);

Instances wVector = new Instances("Vector", attributes, 0);

Instance firstInstance = new Instance(attributes.size());
firstInstance.setDataset(wClassVector);
firstInstance.setValue(one, 1.0);
firstInstance.setValue(two, 2.0);
firstInstance.setValue(three, 3.0);
firstInstance.setValue(four, 4.0);
firstInstance.setValue(five, 5.0);

Instance secondInstance = new Instance(attributes.size());
secondInstance.setDataset(wClassVector);
secondInstance.setValue(one, 10.0);
secondInstance.setValue(two, 20.0);
secondInstance.setValue(three, 30.0);
secondInstance.setValue(four, 40.0);
secondInstance.setValue(five, 50.0);

EuclideanDistance ed = new EuclideanDistance(wClassVector);

Double wDist = ed.distance(firstInstance, secondInstance);

ed.setDontNormalize(true);
Double wDist1 = ed.distance(firstInstance, secondInstance);

Why it calculates not normalized distance wDist1 correct ether normalized distance wDist get NaN as result?

tobias_k
  • 81,265
  • 12
  • 120
  • 179
frankenstein
  • 161
  • 1
  • 13

1 Answers1

0

The normalization of the distance is based on the ranges of the attribute values of the instances of the data set that the distance function was created with.

Your wVector data set does not contain any instances. You have to add the instances like this:

    wVector.add(firstInstance);
    wVector.add(secondInstance);

Then it should work as expected.

Marco13
  • 53,703
  • 9
  • 80
  • 159