I just wanted to sum up all the answers and comments looking at the performance of each proposed solution.
using LinearAlgebra, BenchmarkTools
A = rand(1000, 1000)
B = [i == j for i ∈ 1:1000, j ∈ 1:1000]
B2 = BitArray(B)
CI = CartesianIndex.(1:1000, 1:1000)
step = size(A,1) +1
@btime A[[i == j for i ∈ 1:1000, j ∈ 1:1000]]
2.622 ms (3 allocations: 984.64 KiB)
@btime A[B]
1.806 ms (1 allocation: 7.94 KiB)
@btime A[(1:1000) .== (1:1000)']
509.597 μs (5 allocations: 134.38 KiB)
@btime A[B2]
35.067 μs (1 allocation: 7.94 KiB)
@btime A[[CartesianIndex(i,i) for i in 1:1000]]
6.051 μs (2 allocations: 23.69 KiB)
@btime A[CartesianIndex.(1:1000, 1:1000)]
5.769 μs (2 allocations: 23.69 KiB)
@btime A[CI]
4.093 μs (1 allocation: 7.94 KiB)
@btime A[1:size(A,1)+1:end]
2.123 μs (4 allocations: 8.00 KiB)
@btime A[1:step:end]
2.073 μs (3 allocations: 7.98 KiB)
@btime A[diagind(A)]
2.036 μs (2 allocations: 7.97 KiB)
Using linear indices is the fastest solution, where the inbuilt diagind
performs slightly better. Accessing out of diagonal elements is probably easier with CartesianIndices
and still performs quite good.