0

For the iris data set, I am trying to find the Mahalanobis distances between each pair of species. I have tried the following but have had no luck. I tried the following:

group <- matrix(iris$Species) 
group <- t(group[,-5])

variables <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
varibles <- as.matrix(iris[,variables])

mahala_sq <- pairwise.mahalanobis(x=variables, grouping=group)

But get the error message

Error in pairwise.mahalanobis(x = variables, grouping = group) : nrow(x) and length(grouping) are different

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
daisybeats
  • 217
  • 1
  • 6
  • please tell us what package `pairwise.mahalanobis` is from ... – Ben Bolker Sep 23 '20 at 22:22
  • Does this answer your question? [Mahalanobis distance with multiple observations per group](https://stackoverflow.com/questions/29608280/mahalanobis-distance-with-multiple-observations-per-group) – Andre Silva Mar 21 '21 at 17:00

1 Answers1

3

This works:

HDMD::pairwise.mahalanobis(x=iris[,1:4], grouping=iris$Species)
  • x should be a numeric matrix of observations (columns=variables, rows=observations)
  • grouping should be a "vector of characters or values designating group classification for observations" with length equal to nrow(x)

I realized in editing your question that the problem stems from a typo (you assigned varibles instead of variables); if you fix that typo, your code seems to work (at least doesn't throw an error). (I still claim that my solution is simpler ...)

if you wanted to be a little more careful you could use x <- iris[colnames(x) != "Species"] (or a subset(select=) or dplyr::select() analog) to refer to the omitted column by name rather than position.

If you want (for some reason) to run this analysis with a single response variable, you need to use drop=FALSE to prevent a one-column matrix from being collapsed to a vector, i.e. use x=iris[,1,drop=FALSE]

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thank you. Does that give the squared Mahalanobis distances? To get the Mahalanobis distance would you just do the following: sqrt(m.d$distance)? where m.d is output from your answer – daisybeats Sep 23 '20 at 22:36
  • I have no idea ... read the documentation of the function to find out ... ? or, if the documentation isn't clear, set up an example where you know the answer (and the MD is quite different from 1) to find out ... or compare the answer for one of the pairs to the built-in `stats::mahalanobis()` function, which explicitly says it returns squared MD ... (PS, search for "squared" in `?pairwise.mahalanobis` ...) – Ben Bolker Sep 23 '20 at 22:38
  • Thank you. Can confirm it is the squared distance. How would I alter the code to just test for one attribute, say sepal length. I tried the following but had no luck: `pairwise.mahalanobis(x=iris[,1], grouping=iris$Species)` – daisybeats Sep 24 '20 at 00:07
  • about squared distances, **please read the documentation** ... I suggested how you could find this information in `?pairwise.mahalanobis`. – Ben Bolker Sep 24 '20 at 13:59