1

I want to use flexclust::distEuclidean, but I'm not sure how centers should be specified. There are no examples given in the documentation at ?distance.

So I checked the source of this function (which turns out short):

function (x, centers) 
{
    if (ncol(x) != ncol(centers)) 
        stop(sQuote("x"), " and ", sQuote("centers"), " must have the same number of columns")
    z <- matrix(0, nrow = nrow(x), ncol = nrow(centers))
    for (k in 1:nrow(centers)) {
        z[, k] <- sqrt(colSums((t(x) - centers[k, ])^2))
    }
    z
}
<environment: namespace:flexclust>

If p is a number of features in my data and k is the number of clusters, should centers be a k x p matrix, i.e. consecutive centroids should be in rows?

Actually, it has to be like that, as this function first checks if ncol(x) = ncol(centers). But then we have

   z[, k] <- sqrt(colSums((t(x) - centers[k, ])^2)) 

How t(x) - centers[k,] even works? t(x) is a p x n matrix and centers[k, ] is 1 x p vector, so dimension don't match...

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
jakes
  • 1,964
  • 3
  • 18
  • 50

1 Answers1

1

t(x) is a p x n matrix and centers[k, ] is 1 x p vector, so dimension don't match...

No, centers[k, ] is just a vector without dimension. In t(x) - centers[k, ], the recycling rule in R will apply.

You will get your expected failure if you do t(x) - centers[k, ,drop = FALSE].


A simple example for you to digest:

x <- matrix(1:6, nrow = 3)
y <- x
x - y[, 1]  ## yeah!
x - y[, 1, drop = FALSE]  ## oops!
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • Thanks. I totally forgot that extracting rows from matrix returns a "column vector" by default. – jakes Jul 10 '18 at 17:02