I have a vector V with n elements, each element can be an integer between 1 and N. Given this vector I'd like to construct an N×n matrix W in which column i contains the frequencies of the integers between 1 and N as they appear in the subvector V[1:i].
For example, suppose N=5 and n=7, and V=c(3,1,4,1,2,1,4). Then my matrix W would have elements
0,1,1,2,2,3,3
0,0,0,0,1,1,1
1,1,1,1,1,1,1
0,0,1,1,1,1,2
0,0,0,0,0,0,0
because integer 1 (first row) appears: 0 times in V[1], once in V[1:2], once in V[1:3], twice in V[1:4], twice in V[1:5], three times in V[1:6], three times in V[1:7], etc.
I could this with a for
loop, using table
and factor
for example:
N <- 5
n <- 7
V <- c(3,1,4,1,2,1,4)
W <- matrix(NA,N,n)
for(i in 1:n){
W[,i] <- as.vector(table(factor(V[1:i], levels=1:N)))
}
which in fact gives
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0 1 1 2 2 3 3
[2,] 0 0 0 0 1 1 1
[3,] 1 1 1 1 1 1 1
[4,] 0 0 1 1 1 1 2
[5,] 0 0 0 0 0 0 0
But I wonder if there's some cleverer, faster way that doesn't use a for loop: my N and n are of the order of 100 or 1000.
Any other insight to improve the code above is also welcome (my knowledge of R is still very basic).
Cheers!