1

I wish to count the number of times an element within a desired range appears in each row of a matrix with the added condition that I only want to consider the first n such elements per row.

A similar question, without the added condition, appears here:

counting N occurrences within a ceiling range of a matrix by-row

I have written R code to do what I want, but it uses nested for-loops. I have also replaced the nested for-loops with sapply statements, but they also appear inefficient.

I am hoping someone might suggest a more efficient approach ideally in base R. I provide an example data set, my desired output and functional annotated R code below.

Here is an example data set. My actual data sets will be much larger and I will have an enormous number of them. So, efficiency is important.

my.data  <-  matrix( c(0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      74, 22, 12, 13, 56,  0,  0,  0,  0,  0,
                      88, 77,  5, 77, 34, 98,  0,  0,  0,  0,
                      92,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      89,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      86, 72, 64, 40, 75, 58, 28, 66, 13, 98,
                      18,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                       0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      70, 51, 83, 13, 50, 30,  0,  0,  0,  0,
                       0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      28, 54, 43, 86, 50,  0,  0,  0,  0,  0,
                      45, 83,  0,  0,  0,  0,  0,  0,  0,  0,
                      39, 57, 58, 90, 84, 47, 36,  0,  0,  0,
                      76, 14, 71, 29,  0,  0,  0,  0,  0,  0,
                      23,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                       7,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      77, 58, 90, 91, 47, 40, 58, 89,  0,  0,
                      89, 90,  0,  0,  0,  0,  0,  0,  0,  0,
                      83, 34, 61,  0,  0,  0,  0,  0,  0,  0,
                      17,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      62,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      10, 42,  5, 87, 61,  0,  0,  0,  0,  0,
                      90, 39, 99, 10, 84, 90, 93, 96, 69,  0,
                      84, 40, 44, 82,  0,  0,  0,  0,  0,  0,
                       0,  0,  0,  0,  0,  0,  0,  0,  0,  0),
nrow = 25, ncol = 10, byrow = TRUE)

Here are my desired results. I read each row from left-to-right and 0's are ignored.

# These are the number of elements per row that satisfy all conditions
desired.n.kept      <- c(0, 2, 3, 0, 0, 3, 0, 0, 3, 0, 3, 2, 3, 2, 0, 0, 3, 0, 3, 0, 1, 2, 3, 3, 0)

# These are the number of elements per row that do not satisfy all conditions 
# up through the specified limit on number of elements that do satisfy all conditions
desired.n.discarded <- c(0, 3, 2, 1, 1, 1, 1, 0, 0, 0, 2, 0, 0, 2, 1, 1, 2, 2, 0, 1, 0, 3, 6, 0, 0)

Here I explain several example rows.

In the first row there are no elements that satisfy all conditions. In other words there are no elements that are >= 30 & <= 85. There are also no elements that do not satisfy all conditions, < 30 | > 85 keeping in mind that 0's are ignored. So, < 30 | > 85 might better be thought of as: (> 0 & < 30) | > 85.

In the third row three elements are within the desired range. These are the second, fourth and fifth elements, the two 77's and the 34 because they are >= 30 & <= 85. Two elements (the 88 and the 5) are outside the desired range [(> 0 & < 30) | > 85] to the left of the third element that is within the desired range, i.e., to the left of the fifth element, the 34. The sixth element, the 98, occurs after the limit of 3 kept elements has been reached, i.e., after the two 77's and the 34. So, the sixth element, the 98, is ignored.

In the sixth row three elements satisfy all conditions: the 72, 64 and 40. These three elements are the first three to fall within the desired range: >= 30 & <= 85. One element, the 86, does not satisfy all conditions (it is > 85) up through the third element that is kept, i.e., up through the 40. Because the 40 is the third element to fall within the desired range (>= 30 & <= 85) all six elements to the right of the 40 are ignored regardless of whether they fall within or outside the desired range (the 75, 58, 28, 66, 13, and 98 are ignored).

Here is my initial code using nested for-loops:

# specify the desired range for individual elements
my.min   <- 30
my.max   <- 85

# specify maximum number of elements to keep within desired range per row
my.limit <- 3

my.cols  <- ncol(my.data)
my.rows  <- nrow(my.data)

# indicator matrix identifies elements inside the desired range
in.range <- matrix(0, nrow = my.rows, ncol = my.cols)
in.range[my.data >= my.min & my.data <= my.max] <- 1

# indicator matrix identifies elements outside the desired range
outside.range <- matrix(0, nrow = my.rows, ncol = my.cols)
outside.range[my.data > 0 & (my.data < my.min | my.data > my.max)] <- 1

# count elements that are within the desired range
count.in.range <- t(apply(in.range, 1, cumsum))

# truncate rows after my limit is reached
truncate.rows <- matrix(1, nrow = my.rows, ncol = my.cols)
for(i in 1:my.rows) {
     for(j in 2:my.cols) {
          if((count.in.range[i,(j-1)] >= my.limit) & (count.in.range[i,j] >= my.limit)) {truncate.rows[i,j] = 0}
     }
}

# count the number of elements per row that satisfy all conditions
n.kept <- rowSums(truncate.rows * in.range)
# count the number of elements per row that do not satisfy all conditions
n.discarded <- rowSums(truncate.rows * outside.range)

# verify that my code returns the desired results
all.equal(n.kept, desired.n.kept)
#[1] TRUE
all.equal(n.discarded, desired.n.discarded)
#[1] TRUE

Here is the sapply function I wrote in place of nested for-loops. It does work but you can see it appears overly complex:

# This sapply approach returns a matrix with only 9 columns and many NULL elements
truncate.rows2 <- matrix(1, nrow = my.rows, ncol = my.cols)
truncate.rows2 <- t(sapply(1:my.rows, function (i) {
                       sapply(2:my.cols, function(j) {  
                            if((count.in.range[i,(j-1)] >= my.limit) & (count.in.range[i,j] >= my.limit)) {truncate.rows2[i,j] = 0}
                       })
                  }))
truncate.rows2

# modify truncate.rows2 to eliminate NULL elements and restore the first column
truncate.rows3 <- matrix(as.numeric(as.character(truncate.rows2)), ncol = (my.cols-1), nrow = my.rows)
truncate.rows3[is.na(truncate.rows3)] <- 1
truncate.rows3 <- cbind(truncate.rows[,1], truncate.rows3)
truncate.rows3

all.equal(truncate.rows, truncate.rows3)
#[1] TRUE
Mark Miller
  • 12,483
  • 23
  • 78
  • 132
  • 1
    I really don't understand your examples. Can you make your examples based on numbers (instead of 6th element, 3rd element) and avoid duplicated numbers? Maybe that makes understanding this easier. – M-- Oct 22 '19 at 22:07
  • I will edit to add the numeric values of elements in the three examples. – Mark Miller Oct 22 '19 at 22:09

1 Answers1

0

I replaced the nested for-loop with the following code:

# determine last element to keep by row
last.one <- apply(count.in.range, 1, function(x) min(which(x == 3), na.rm = TRUE))
last.one[is.infinite(last.one)] <- 10

# identity matrix of elements to keep
truncate.rows <- matrix(0, nrow = my.rows, ncol = my.cols)
sapply(1:my.rows, function(x) truncate.rows[x,1:last.one[x]] <<- 1)

So, the complete functional code becomes:

my.data  <-  matrix( c(0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      74, 22, 12, 13, 56,  0,  0,  0,  0,  0,
                      88, 77,  5, 77, 34, 98,  0,  0,  0,  0,
                      92,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      89,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      86, 72, 64, 40, 75, 58, 28, 66, 13, 98,
                      18,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                       0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      70, 51, 83, 13, 50, 30,  0,  0,  0,  0,
                       0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      28, 54, 43, 86, 50,  0,  0,  0,  0,  0,
                      45, 83,  0,  0,  0,  0,  0,  0,  0,  0,
                      39, 57, 58, 90, 84, 47, 36,  0,  0,  0,
                      76, 14, 71, 29,  0,  0,  0,  0,  0,  0,
                      23,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                       7,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      77, 58, 90, 91, 47, 40, 58, 89,  0,  0,
                      89, 90,  0,  0,  0,  0,  0,  0,  0,  0,
                      83, 34, 61,  0,  0,  0,  0,  0,  0,  0,
                      17,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      62,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                      10, 42,  5, 87, 61,  0,  0,  0,  0,  0,
                      90, 39, 99, 10, 84, 90, 93, 96, 69,  0,
                      84, 40, 44, 82,  0,  0,  0,  0,  0,  0,
                       0,  0,  0,  0,  0,  0,  0,  0,  0,  0),
nrow = 25, ncol = 10, byrow = TRUE)

desired.n.kept      <- c(0, 2, 3, 0, 0, 3, 0, 0, 3, 0, 3, 2, 3, 2, 0, 0, 3, 0, 3, 0, 1, 2, 3, 3, 0)
desired.n.discarded <- c(0, 3, 2, 1, 1, 1, 1, 0, 0, 0, 2, 0, 0, 2, 1, 1, 2, 2, 0, 1, 0, 3, 6, 0, 0)

# specify desired range for individual elements
my.min   <- 30
my.max   <- 85

# specify maximum number of elements to keep within desired range per row
my.limit <- 3

my.cols  <- ncol(my.data)
my.rows  <- nrow(my.data)

# indicator matrix identifies elements inside the desired range
in.range <- matrix(0, nrow = my.rows, ncol = my.cols)
in.range[my.data >= my.min & my.data <= my.max] <- 1

# indicator matrix identifies elements outside the desired range
outside.range <- matrix(0, nrow = my.rows, ncol = my.cols)
outside.range[my.data > 0 & (my.data < my.min | my.data > my.max)] <- 1

# count elements that are within the desired range
count.in.range <- t(apply(in.range, 1, cumsum))

# determine last element to keep by row
last.one <- apply(count.in.range, 1, function(x) min(which(x == 3), na.rm = TRUE))
last.one[is.infinite(last.one)] <- 10

# identity matrix of elements to keep
truncate.rows <- matrix(0, nrow = my.rows, ncol = my.cols)
sapply(1:my.rows, function(x) truncate.rows[x,1:last.one[x]] <<- 1)

# count the number of elements per row that satisfy all conditions
n.kept <- rowSums(truncate.rows * in.range)
# count the number of elements per row that do not satisfy all conditions
n.discarded <- rowSums(truncate.rows * outside.range)

# verify that my code returns the desired results
all.equal(n.kept, desired.n.kept)
#[1] TRUE
all.equal(n.discarded, desired.n.discarded)
#[1] TRUE
Mark Miller
  • 12,483
  • 23
  • 78
  • 132