Vectorizing over the output of Hmisc::partition.matrix

Question

I have been trying to simulate a binomial process that occurs over several weeks, for different pool sizes. I have 7 different pool sizes, and probabilities of the positive outcome for each week. I have simulated a Bernoulli process, which gives me a fixed number of positive outcomes each week. I want to allocate each of these positive outcomes to a pool using the partition.matrix function from Hmisc, as below:

library(Hmisc)
set.seed(1)

# vector of pool sizes
pool_sizes <- c(10,15,25,35,40,45,50)

# Vector of weekly probabilities
weekly_probabilities <- c(0.12,0.37,0.68,0.43,0.2)

# weekly bernoulli trials
pos_neg <- sapply(weekly_probabilities,function(x)rbinom(sum(pool_sizes),size = 1, p = x))

# Assigning outcomes to pools
pos_neg_pools <- partition.matrix(pos_neg,rowsep = pool_sizes,colsep = 5)

I am running into problems while getting the number of outcomes each week, for each pool. I tried the mapply function, but I get an error saying Error in .Primitive("sum")(dots[[1L]][[1L]]) : invalid 'type' (list) of argument. Since pos_neg_pools has class list, I tried applying lapply twice:

lapply(pos_neg_pools,function(x)apply(x,2,sum))

but got this error:

Error in apply(x, 2, sum) : dim(X) must have a positive length

The issue appears to be that dim(pos_neg_pools[1]) and dim(pos_neg_pools[[1]]) are NULL, and each element of pos_neg_pools can only be accessed as pos_neg_pools$`1`$`1. I am not sure how to vectorize over this output, and I don't want to use a for loop, which might be cumbersome if I have a few thousand pools, and 52 weeks of data for each.

How can I go about this? Any help is appreciated.

score 1 · Accepted Answer · answered Dec 04 '19 at 12:27

You have a list of list, and the split matrix is always under the first subelement:

 str(pos_neg_pools)
List of 7
 $ 1:List of 1
  ..$ 1: int [1:10, 1:5] 0 0 0 1 0 1 1 0 0 0 ...
 $ 2:List of 1
  ..$ 1: int [1:15, 1:5] 0 0 0 0 0 0 0 1 0 0 ...
 $ 3:List of 1
  ..$ 1: int [1:25, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
 $ 4:List of 1
  ..$ 1: int [1:35, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
 $ 5:List of 1
  ..$ 1: int [1:40, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
 $ 6:List of 1
  ..$ 1: int [1:45, 1:5] 0 0 0 0 0 0 0 0 0 1 ...
 $ 7:List of 1
  ..$ 1: int [1:50, 1:5] 0 0 0 0 0 1 0 0 0 1 ...

This is the dimension of each matrix:

sapply(pos_neg_pools,function(x)dim(x[[1]]))
      1  2  3  4  5  6  7
[1,] 10 15 25 35 40 45 50
[2,]  5  5  5  5  5  5  5

In the code posted, you are applying your functions on a list. You need to go one level below with:

sapply(pos_neg_pools,function(x)colSums(x[[1]]))

     1  2  3  4  5  6  7
[1,] 3  2  0  3  4  4  9
[2,] 2  4  7  6 17 16 23
[3,] 9 13 14 24 25 29 34
[4,] 5  5 12 13 18 17 21
[5,] 4  1  5  8  8 10 10

or:

 lapply(pos_neg_pools,function(x)colSums(x[[1]]))

To avoid this weird list of list, since you don't need to split your columns, do:

X = partition.matrix(pos_neg,rowsep = pool_sizes)
sapply(X,colSums)

     1  2  3  4  5  6  7
[1,] 3  2  0  3  4  4  9
[2,] 2  4  7  6 17 16 23
[3,] 9 13 14 24 25 29 34
[4,] 5  5 12 13 18 17 21
[5,] 4  1  5  8  8 10 10

Vectorizing over the output of Hmisc::partition.matrix

1 Answers1