ELKI relies on double
for representing numbers. If you need a higher precision, you will have to implement your own parser and output modules (it's easy though, as we have a highly modular architecture).
Default output serialization to text is handled by Java. Precision is therefore what you get from Java by default. This should be 15-16 digits of precision, if you are using DoubleVector
, and 7-8 digits if you are using FloatVector
.
A quick check with groovysh:
new DoubleVector([12345.678901234567890, 3456.109453] as double[]);
===> 12345.678901234567 3456.109453
new FloatVector([12345.678901234567890, 3456.109453] as float[]);
===> 12345.679 3456.1094
yields only the loss to be expected from double
and float
precision.
The best way to get row labels is to... add row labels to your data.
Wrt. to your add-on question in the comments: The default parser will treat a text row at the beginning of your file as column labels. So just put "X Y" into the first line of your file.
A reasonable input format will therefore be:
X Y Label
1 2 Point7
3 4 "Point 8"
The following are not-so-good ideas:
5 6 123shouldwork
7 8 don't do this: 3 parser will retain the 3
label should be non-numeric, so that the parser will treat it as label automatically. Otherwise, you have to set the appropriate parameter.
DBIDs are meant for internal handling. Maybe we should not write them to the output at all. FixedDBIDFilter
is a hackish work-around; it is meant to be used to get reproducible hashing when using algorithms that need id-based hashing and doing multiple runs in the MiniGUI. Because on multiple runs, DBIDs will be continuously enumerated.