I'm using weka for clustering binary data. Note that I use weka directly through the API or the source code.
My data input is a huge .csv file for example
attrib1, attrib2, atrib3
0,1,0
1,0,1
0,0,1
But in order to reduce the .csv size the data provider (I don't have direct access to the dataset) ignores zeros and the above snippet is writtern as
attrib1, attrib2, atrib3
,1,
1,,1
,,1
So i figured out that weka treats the value between two commas as a "Missing Value" (that's the term used in the code base) which I don't like.
I've been trying to work it out directly through the source code.
In particular the CSVLoader.getDataSet() and the CSVLoader.getInstance() along with ConverterUtils.getToken() seem to be responsible for this stuff.
I've tried a lot to change the code and make weka treat this null values (because that's what weka thinks they are) as zeros but I can't find the solution.
Can someone provide a better solution?