I was surprised to find out that clara
from library(cluster)
allows NAs. But function documentation says nothing about how it handles these values.
So my questions are:
- How
clara
handles NAs? - Can this be somehow used for
kmeans
(Nas not allowed)?
[Update] So I did found lines of code in clara
function:
inax <- is.na(x)
valmisdat <- 1.1 * max(abs(range(x, na.rm = TRUE)))
x[inax] <- valmisdat
which do missing value replacement by valmisdat
. Not sure I understand the reason to use such formula. Any ideas? Would it be more "natural" to treat NAs by each column separately, maybe replacing with mean/median?