2

I have two groups, that each group has 3 variables such as following:

Group1:
     cost time quality
[1,]   90    4      70
[2,]    4   27      37
[3,]   82    4      17
[4,]   18   41       4

Group2:

       cost time quality
[1,]    4   27       4

codes to calculate mahalanobis distance between two groups are as following:

      benchmark<-rbind(c(90,4,70),c(4,27,37),c(82,4,17),c(18,41,4))
           colnames(benchmark)=c('cost','time','quality')
           current=rbind(c(4,27,4))
           colnames(current)=c('cost','time','quality')
    bdm<-as.matrix(benchmark)
    cdm<-as.matrix(current)
  mat1<-matrix(bdm,ncol=ncol(bdm),dimnames=NULL)
        mat2<-matrix(cdm,ncol=ncol(cdm),dimnames=NULL)

        #center Data
        mat1.1<-scale(mat1,center = T,scale = F)
        mat2.1<-scale(mat2,center=T,scale=F)

        #cov Matrix
        mat1.2<-cov(mat1.1,method="pearson")

        mat2.2<-cov(mat2.1,method="pearson")

        #the pooled covariance is calculated using weighted average
        n1<-nrow(mat1)
        n2<-nrow(mat2)
        n3<-n1+n2
        #pooled matrix
        #pooled matrix
        mat3<-((n1/n3)*mat1.2) + ((n2/n3)*mat2.2)

        mat4<-solve(mat3)

        #Mean diff
        mat5<-as.matrix((colMeans(mat1)-colMeans(mat2)))
        #multiply
        mat6<-t(mat5)%*%mat4
        #Mahalanobis distance  
        sqrt(mat6 %*% mat5)

The Result is NA but when I entered the values in the following link calculate mahalanobis distance to calculate the mahalanobis distance it shows Mahalanobis Distance between group1 and group2 = 2.4642

Moreover the error message that I got is :

Error in ((n1/n3) * mat1.2) + ((n2/n3) * mat2.2) : non-conformable arrays

and the Warning message:

In colMeans(mat1) - colMeans(mat2) :
  longer object length is not a multiple of shorter object length
user
  • 592
  • 6
  • 26
  • is it right if I replace the NA in mat2.2 to zero by adding mat2.2[is.na(mat2.2)] <- 0 to get rid of NA matrix – user Jan 10 '16 at 09:44

1 Answers1

2

I felt like what you are trying to do must exist in some R package. After a pretty thorough search, I found function D.sq in package asbio which looks very close. This function requires 2 matrices as input, so it doesn't work for your example. I also include a modified version that accepts a vector for the 2nd matrix.

# Original Function
D.sq <- function (g1, g2) {
    dbar <- as.vector(colMeans(g1) - colMeans(g2))
    S1 <- cov(g1)
    S2 <- cov(g2)
    n1 <- nrow(g1)
    n2 <- nrow(g2)
    V <- as.matrix((1/(n1 + n2 - 2)) * (((n1 - 1) * S1) + ((n2 - 
        1) * S2)))
    D.sq <- t(dbar) %*% solve(V) %*% dbar
    res <- list()
    res$D.sq <- D.sq
    res$V <- V
    res
}

# Data
g1 <- matrix(c(90, 4, 70, 4, 27, 37, 82, 4, 17, 18, 41, 4), ncol = 3, byrow = TRUE)
g2 <- c(2, 27, 4)

# Function modified to accept a vector for g2 rather than a matrix
D.sq2 <- function (g1, g2) {
    dbar <- as.vector(colMeans(g1) - g2)
    S1 <- cov(g1)
    S2 <- var(g2)
    n1 <- nrow(g1)
    n2 <- length(g2)
    V <- as.matrix((1/(n1 + n2 - 2)) * (((n1 - 1) * S1) + ((n2 - 
        1) * S2)))
    D.sq <- t(dbar) %*% solve(V) %*% dbar
    res <- list()
    res$D.sq <- D.sq
    res$V <- V
    res
}

However, this doesn't quite give the answer you expect: D.sq2(g1,g2)$D.sq returns 2.2469.

Perhaps you can compare your original matlab method with these details and figure out the source of the difference. A quick look suggests the difference is how the denominator in V is computed. It may well also be an error on my part, but hopefully this gets you going.

Bryan Hanson
  • 6,055
  • 4
  • 41
  • 78
  • Thank you so much , yes the difference is in the denominator in v – user Jan 11 '16 at 01:39
  • Assume that we have 3*3 matrix of g2 instead of a vector su ac following :Then I used apply function to convert a matrix to a vector: 'current<-rbind(c(4,27,4),c(3,2,5),c(2,4,1)) mat2<- apply(mat2,2,as.vector)' and a for loop to apply D.sq2 function on each row of vector. 'g2<- apply(g2,2,as.vector) for(i in 1:length(g2)){apply(g2,1,(g1,g2))}' but it does not work and I got following error: ''Error in t(dbar) %*% solve(V) : non-conformable arguments'' – user Jan 11 '16 at 01:54
  • I don't quite follow your new case. I did notice that the `mahalanobis` function in `stats` will take a matrix as one input, and a vector with the same number of columns, and give you the distance for each row. Is that what you want? Consider `mahalanobis(g1, g2, cov(g1))` gives a vector of length 4, since `g1` has 4 rows. – Bryan Hanson Jan 11 '16 at 01:56
  • Yes I want exactly the distance of each row from the benchmark matrix – user Jan 11 '16 at 01:59
  • Then I think you can use `mahalanobis` directly. You can type `mahalanobis` at the prompt and see the code it is using. – Bryan Hanson Jan 11 '16 at 02:01
  • Thank you very much, I will Try and let you know – user Jan 11 '16 at 02:03
  • I used mahalanobis(g2,g1,cov(g1)) and it works well but it gives me a warning message: In sweep(x, 2L, center) : STATS is longer than the extent of 'dim(x)[MARGIN]' – user Jan 11 '16 at 02:08
  • You've got `g1` and `g2` reversed in the code above. If you want `g1` centered, try `g1c <- scale(g1, scale = FALSE); res <- mahalanobis(g1c, g2, cov(g1c))` – Bryan Hanson Jan 11 '16 at 02:14
  • but when I put g1 first and then g2 it gives me 4 values for the distance but I have only 3 rows of g2, means that I should have 3 distance values , one for each row of g2. what if I write : mahalanobis(g2,colMeans(g1),cov(g1)) – user Jan 11 '16 at 02:35
  • `g1` has 4 rows so it should give 4 distances when compared to one vector. If `g2` now has 3 rows, which comparison is to be made? `g1` against each row of `g2` in separate steps? That would require a simple loop and give a total of 12 distances. Is that what you are after? – Bryan Hanson Jan 11 '16 at 02:43
  • I think I've lost track of what you are asking. The original question is answered, I think. The new problem has `g1` as a 4 x 3 matrix and `g2` as a 3 x 3 matrix. I think this should probably be a new question, perhaps linking back to this one. – Bryan Hanson Jan 11 '16 at 02:52
  • Yes the original question is answered. – user Jan 11 '16 at 03:10