2

I want to iterate over different matrix blocks based on an index variable. You can think of it as how you would compute the individual contributions to the loglikelihood of the different individuals on a model that uses panel data. That being said, I want it to be as fast as it can be.

I've already read some questions related to it. But none of them answer my question directly. For example, What is the recommended way to iterate a matrix over rows? shows ways to run over the WHOLE bunch of rows not iterating over blocks of rows. Additionally, Julia: loop over rows of matrix (or not) is also about how to iterate over every row again and not over blocks of them.

So here is my question. Say you have X, which is a 2x9 matrix and an id variable that indexes the individuals in the sample. I want to iterate over them to construct my loglikelihood contributions as fast as possible. I did it here just by slicing the matrix using booleans, but this seems relatively inefficient given I am for each individual checking the entire vector to see if they match or not.

id = [1 1 2 2 2 2 3 3 3 ]
x1 = [1 2 3 4 5 6 7 8 9]
x2 = [10 11 12 13 14 15 16 17 18]
X = transpose([x1 ; x2 ])
# 9×2 transpose(::Matrix{Int64}) with eltype Int64:
# 1  10
# 2  11
# 3  12
# 4  13
# 5  14
# 6  15
# 7  16
# 8  17
# 9  18

#Iteraring over the block of observations. 
for n in unique(id)
  select_row = vec(isone.(n.==transpose(id)))
  X_n = X[ select_row , : ]
  print(X_n)
end
#[1 10; 2 11]
#[3 12; 4 13; 5 14; 6 15]
#[7 16; 8 17; 9 18]

In other languages (see below), I created a matrix that contains each individual's positions beforehand (outside of the loop) and then index my regressors based on that. Still, I don't know how to do it in Julia (or if there is a function for it), and I also don't know if this is the fastest way to go in this language.


Mata example

For example, just for the sake of illustration, in Mata (matrix language from Stata) I would do the following:

mata:
id = 1 \1 \2 \2 \2 \2 \3 \ 3\ 3 
x1 = 1 \2 \3 \4 \5 \6 \7 \ 8\ 9
x2 = 10 \11 \12 \13 \14 \15 \16 \ 17\ 18
X  = x1 , x2
/*this indicate where is each invididual*/
paninfo = panelsetup(id,1) 
paninfo
//       1   2
//    +---------+
//  1 |  1   2  |
//  2 |  3   6  |
//  3 |  7   9  |
//    +---------+
/*this gave me the total number of individuals*/
npanels = panelstats(paninfo)[1]  
npanels 
// 3
for(n=1; n <= npanels; ++n) {
    /* this selects the block of variables for individual n*/
    x_n = panelsubmatrix(X, n, paninfo)  
    x_n 
}
//        1    2
//    +-----------+
//  1 |   1   10  |
//  2 |   2   11  |
//    +-----------+
//        1    2
//    +-----------+
//  1 |   3   12  |
//  2 |   4   13  |
//  3 |   5   14  |
//  4 |   6   15  |
//    +-----------+
//        1    2
//    +-----------+
//  1 |   7   16  |
//  2 |   8   17  |
//  3 |   9   18  |
//    +-----------+
end 

1 Answers1

1

First, I would recommend you to use vectors instead of matrices:

julia> id = [1, 1, 2, 2, 2, 2, 3, 3, 3]
9-element Vector{Int64}:
 1
 1
 2
 2
 2
 2
 3
 3
 3

julia> x1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
9-element Vector{Int64}:
 1
 2
 3
 4
 5
 6
 7
 8
 9

julia> x2 = [10, 11, 12, 13, 14, 15, 16, 17, 18]
9-element Vector{Int64}:
 10
 11
 12
 13
 14
 15
 16
 17
 18

julia> X = [x1 x2]
9×2 Matrix{Int64}:
 1  10
 2  11
 3  12
 4  13
 5  14
 6  15
 7  16
 8  17
 9  18

Now the relatively easy way to get grouping of X on id in your context is to use SplitApplyCombine.jl:

julia> res = map(x -> view(X, x, :), groupfind(id))
3-element Dictionaries.Dictionary{Int64, SubArray}
 1 │ [1 10; 2 11]
 2 │ [3 12; 4 13; 5 14; 6 15]
 3 │ [7 16; 8 17; 9 18]

julia> res[1]
2×2 view(::Matrix{Int64}, [1, 2], :) with eltype Int64:
 1  10
 2  11

julia> res[2]
4×2 view(::Matrix{Int64}, [3, 4, 5, 6], :) with eltype Int64:
 3  12
 4  13
 5  14
 6  15

julia> res[3]
3×2 view(::Matrix{Int64}, [7, 8, 9], :) with eltype Int64:
 7  16
 8  17
 9  18

I use views to improve performance of the operation.

Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107
  • Thank you very much for your answer, @BogumiłKamiński. I just checked and I got a warning saying that `groupinds()` is deprecated (`SplitApplyCombine.jl` version `v1.2.1`)and I should use **`groupfind()`** instead (both produce the same result, of course) – Álvaro A. Gutiérrez-Vargas Feb 02 '22 at 15:22
  • 1
    Ah - strange. Now that I restarted Julia I see the same warning (not sure why I have not gotten it before). I will update the answer. – Bogumił Kamiński Feb 02 '22 at 15:25
  • Regarding the original question. You will tackle the problem by basically looping over what is inside of the `res` object? For instance: `for n in 1:length(res) foo(res[n]) end` where `foo()` creates the log-likelihood contributions for individual `n`, right? – Álvaro A. Gutiérrez-Vargas Feb 02 '22 at 15:34
  • 1
    Yes, or more simply e.g. `mapreduce(foo, +, values(res))` (this will compute and add the log-likelihoods in one shot). I have not commented on this part as you have not specified the code you want to use on the groups. – Bogumił Kamiński Feb 02 '22 at 16:18