I'm trying to discretize a set of data using RWeka's Discretize
filter. My dataset starts out at over a million records, but the returned set only has some 100k. The function looks like it's supposed to return the bins per record in the set, so I'm curious to where did 90% of my data go?
There are NA's in the data, so I tried and got the following:
> disc_data = Discretize(class~.,data=num_data, na.action=na.pass)
Error in .jarray(x) : java.lang.OutOfMemoryError: Java heap space
I'm working off of someone else's code, and it doesn't seem to have been a problem before. I'm not sure whether this is a lack of understanding of discretization, or of R. Can anybody explain how Discretize
is suppose to work and what is going on?