I wish to count the number of times an element within a desired range appears in each row of a matrix
with the added condition that I only want to consider the first n such elements per row.
A similar question, without the added condition, appears here:
counting N occurrences within a ceiling range of a matrix by-row
I have written R
code to do what I want, but it uses nested for-loops
. I have also replaced the nested for-loops
with sapply
statements, but they also appear inefficient.
I am hoping someone might suggest a more efficient approach ideally in base R
. I provide an example data set, my desired output and functional annotated R
code below.
Here is an example data set. My actual data sets will be much larger and I will have an enormous number of them. So, efficiency is important.
my.data <- matrix( c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
74, 22, 12, 13, 56, 0, 0, 0, 0, 0,
88, 77, 5, 77, 34, 98, 0, 0, 0, 0,
92, 0, 0, 0, 0, 0, 0, 0, 0, 0,
89, 0, 0, 0, 0, 0, 0, 0, 0, 0,
86, 72, 64, 40, 75, 58, 28, 66, 13, 98,
18, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
70, 51, 83, 13, 50, 30, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
28, 54, 43, 86, 50, 0, 0, 0, 0, 0,
45, 83, 0, 0, 0, 0, 0, 0, 0, 0,
39, 57, 58, 90, 84, 47, 36, 0, 0, 0,
76, 14, 71, 29, 0, 0, 0, 0, 0, 0,
23, 0, 0, 0, 0, 0, 0, 0, 0, 0,
7, 0, 0, 0, 0, 0, 0, 0, 0, 0,
77, 58, 90, 91, 47, 40, 58, 89, 0, 0,
89, 90, 0, 0, 0, 0, 0, 0, 0, 0,
83, 34, 61, 0, 0, 0, 0, 0, 0, 0,
17, 0, 0, 0, 0, 0, 0, 0, 0, 0,
62, 0, 0, 0, 0, 0, 0, 0, 0, 0,
10, 42, 5, 87, 61, 0, 0, 0, 0, 0,
90, 39, 99, 10, 84, 90, 93, 96, 69, 0,
84, 40, 44, 82, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
nrow = 25, ncol = 10, byrow = TRUE)
Here are my desired results. I read each row from left-to-right and 0's are ignored.
# These are the number of elements per row that satisfy all conditions
desired.n.kept <- c(0, 2, 3, 0, 0, 3, 0, 0, 3, 0, 3, 2, 3, 2, 0, 0, 3, 0, 3, 0, 1, 2, 3, 3, 0)
# These are the number of elements per row that do not satisfy all conditions
# up through the specified limit on number of elements that do satisfy all conditions
desired.n.discarded <- c(0, 3, 2, 1, 1, 1, 1, 0, 0, 0, 2, 0, 0, 2, 1, 1, 2, 2, 0, 1, 0, 3, 6, 0, 0)
Here I explain several example rows.
In the first row there are no elements that satisfy all conditions. In other words there are no elements that are >= 30 & <= 85
. There are also no elements that do not satisfy all conditions, < 30 | > 85
keeping in mind that 0's are ignored. So, < 30 | > 85
might better be thought of as: (> 0 & < 30) | > 85
.
In the third row three elements are within the desired range. These are the second, fourth and fifth elements, the two 77
's and the 34
because they are >= 30 & <= 85
. Two elements (the 88
and the 5
) are outside the desired range [(> 0 & < 30) | > 85
] to the left of the third element that is within the desired range, i.e., to the left of the fifth element, the 34
. The sixth element, the 98
, occurs after the limit of 3 kept elements has been reached, i.e., after the two 77
's and the 34
. So, the sixth element, the 98
, is ignored.
In the sixth row three elements satisfy all conditions: the 72
, 64
and 40
. These three elements are the first three to fall within the desired range: >= 30 & <= 85
. One element, the 86
, does not satisfy all conditions (it is > 85
) up through the third element that is kept, i.e., up through the 40
. Because the 40
is the third element to fall within the desired range (>= 30 & <= 85
) all six elements to the right of the 40
are ignored regardless of whether they fall within or outside the desired range (the 75
, 58
, 28
, 66
, 13
, and 98
are ignored).
Here is my initial code using nested for-loops
:
# specify the desired range for individual elements
my.min <- 30
my.max <- 85
# specify maximum number of elements to keep within desired range per row
my.limit <- 3
my.cols <- ncol(my.data)
my.rows <- nrow(my.data)
# indicator matrix identifies elements inside the desired range
in.range <- matrix(0, nrow = my.rows, ncol = my.cols)
in.range[my.data >= my.min & my.data <= my.max] <- 1
# indicator matrix identifies elements outside the desired range
outside.range <- matrix(0, nrow = my.rows, ncol = my.cols)
outside.range[my.data > 0 & (my.data < my.min | my.data > my.max)] <- 1
# count elements that are within the desired range
count.in.range <- t(apply(in.range, 1, cumsum))
# truncate rows after my limit is reached
truncate.rows <- matrix(1, nrow = my.rows, ncol = my.cols)
for(i in 1:my.rows) {
for(j in 2:my.cols) {
if((count.in.range[i,(j-1)] >= my.limit) & (count.in.range[i,j] >= my.limit)) {truncate.rows[i,j] = 0}
}
}
# count the number of elements per row that satisfy all conditions
n.kept <- rowSums(truncate.rows * in.range)
# count the number of elements per row that do not satisfy all conditions
n.discarded <- rowSums(truncate.rows * outside.range)
# verify that my code returns the desired results
all.equal(n.kept, desired.n.kept)
#[1] TRUE
all.equal(n.discarded, desired.n.discarded)
#[1] TRUE
Here is the sapply
function I wrote in place of nested for-loops
. It does work but you can see it appears overly complex:
# This sapply approach returns a matrix with only 9 columns and many NULL elements
truncate.rows2 <- matrix(1, nrow = my.rows, ncol = my.cols)
truncate.rows2 <- t(sapply(1:my.rows, function (i) {
sapply(2:my.cols, function(j) {
if((count.in.range[i,(j-1)] >= my.limit) & (count.in.range[i,j] >= my.limit)) {truncate.rows2[i,j] = 0}
})
}))
truncate.rows2
# modify truncate.rows2 to eliminate NULL elements and restore the first column
truncate.rows3 <- matrix(as.numeric(as.character(truncate.rows2)), ncol = (my.cols-1), nrow = my.rows)
truncate.rows3[is.na(truncate.rows3)] <- 1
truncate.rows3 <- cbind(truncate.rows[,1], truncate.rows3)
truncate.rows3
all.equal(truncate.rows, truncate.rows3)
#[1] TRUE