5

This question is similar to questions that have been asked regarding floating-point error in other languages (for example here), however I haven't found a satisfactory solution.

I'm working on a project that involves investigating matrices that share certain characteristics. As part of that, I need to know how many matrices in a list are unique.

 D <- as.matrix(read.table("datasource",...))
 mat_list <- vector('list',length=length(samples_list))
 mat_list <- lapply(1:length(samples_list),function(i) matrix(data=0,nrow(D),ncol(D)))

This list is then populated by computations from the data based on the elements of samples_list. After mat_list has been populated, I need to removed duplicates. Running

mat_list <- unique(mat_list)

narrows things down quite a bit; however, many of those elements are really within machine error of each other. The function unique does not allow one to specify precision, and I was unable to find source code for modification.

One idea I had was this:

ErrorReduction<-function(mat_list, tol=2){
  len <- length(mat_list)
  diff <- mat_list[[i]]-mat_list[[i+1]]
  for(i in 1:len-1){
     if(norm(diff,"i")<tol){
     mat_list[[i+1]] <- mat_list[i]
     }
  }
  mat_list<-unique(mat_list)
  return(mat_list)
}

but this only looks at pairwise differences. It would be simple but most likely inefficient to do this with nested for loops.

What methods do you know of, or what ideas do you have, of handling the problem of identifying and removing matrices that are within machine error of being duplicates?

Community
  • 1
  • 1
Daniel Watkins
  • 1,631
  • 14
  • 16
  • Can you post a couple of examples where you think the answer is "equal" and "not equal" and specify why? At the moment you have "datasource" but no one else does. We encourage the use of using `dput(mat)`. – IRTFM Apr 29 '13 at 23:55

2 Answers2

6

Here is a function that applies all.equal to every pair using outer and removes all duplicates:

approx.unique <- function(l) {
   is.equal.fun <- function(i, j)isTRUE(all.equal(norm(l[[i]] - l[[j]], "M"), 0))
   is.equal.mat <- outer(seq_along(l), seq_along(l), Vectorize(is.equal.fun))
   is.duplicate <- colSums(is.equal.mat * upper.tri(is.equal.mat)) > 0
   l[!is.duplicate]
}

An example:

a <- matrix(runif(12), 4, 3)
b <- matrix(runif(12), 4, 3)
c <- matrix(runif(12), 4, 3)

all <- list(a1 = a, b1 = b, a2 = a, a3 = a, b2 = b, c1 = c)

names(approx.unique(all))
# [1] "a1" "b1" "c1"
flodel
  • 87,577
  • 21
  • 185
  • 223
  • That is an interesting approach, I wasn't aware of the `all.equal` function. It does seem to use a lot of memory: try using it on a set of matrices of dimension 360x300, and you'll see what I mean. For example: `M1 <- matrix(data=0,nrow=360, ncol=300); M2 <- M1 + rep(1e-17,300); all <- list(M1,M2); all <- rep(all,30)` This should recognize they are all a zero matrix, but my computer (with 12 Gb RAM) can't handle it. – Daniel Watkins Apr 30 '13 at 16:45
  • @D_Watkins, I have slightly edited my answer. The change should make better use of memory. – flodel Apr 30 '13 at 17:05
  • Why not just `is.equal.fun <- function(i, j) all.equal(norm(l[[i]] - l[[j]], "M"), 0)`? – Rcoster Apr 30 '13 at 17:16
  • you need to wrap it inside `isTRUE`, see `?all.equal`. See what `all.equal(1, 2)` gives... – flodel Apr 30 '13 at 17:40
1

I believe you are looking for all.equal which compares objects 'within machine error'. Check out ?all.equal.

Bryan Hanson
  • 6,055
  • 4
  • 41
  • 78