2

Using julia, I want to select the first x rows of an array per group.

In the following example, I want the first two rows where the second column is equal to 1.0, then the first two rows where the second column is equal to 2.0, etc.

XX = [repeat([1.0], 6) vcat(repeat([1.0], 3), repeat([2.0], 3))]
XX2 = [repeat([2.0], 6) vcat(repeat([3.0], 3), repeat([4.0], 3))]
beg = [XX;XX2]

> 12×2 Matrix{Float64}:
>  1.0  1.0
>  1.0  1.0
>  1.0  1.0
>  1.0  2.0
>  1.0  2.0
>  1.0  2.0
>  2.0  3.0
>  2.0  3.0
>  2.0  3.0
>  2.0  4.0
>  2.0  4.0
>  2.0  4.0

The final array would look like this:

8×2 Matrix{Float64}:
 1.0  1.0
 1.0  1.0
 1.0  2.0
 1.0  2.0
 2.0  3.0
 2.0  3.0
 2.0  4.0
 2.0  4.0

I use the following code, but I am not sure whether there is a simpler way (one function) that does already that in a more efficient way?

x = []
for val in unique(beg[:,2])
    x = append!(x, findfirst(beg[:,2].==val))
end
idx = sort([x; x.+1])
final = beg[idx, :] 
djourd1
  • 459
  • 4
  • 14

1 Answers1

1

Assuming your data:

  • is sorted (i.e. groups are forming continuous blocks)
  • each group is guaranteed to have at least two elements

(your code assumes both)

then you can generate idx filter that you want in the following way:

idx == [i for i in axes(beg, 1) if i < 3 || beg[i, 2] != beg[i-1, 2] || beg[i, 2] != beg[i-2, 2]]

If you cannot assume either of the above please comment and I can show a more general solution.

EDIT

Here is an example without using any external packages:

julia> using Random

julia> XX = [repeat([1.0], 6) vcat(repeat([1.0], 3), repeat([2.0], 3))]
6×2 Matrix{Float64}:
 1.0  1.0
 1.0  1.0
 1.0  1.0
 1.0  2.0
 1.0  2.0
 1.0  2.0

julia> XX2 = [repeat([2.0], 7) vcat(repeat([3.0], 3), repeat([4.0], 3), 5.0)] # last group has length 1
7×2 Matrix{Float64}:
 2.0  3.0
 2.0  3.0
 2.0  3.0
 2.0  4.0
 2.0  4.0
 2.0  4.0
 2.0  5.0

julia> beg = [XX;XX2][randperm(13), :] # shuffle groups so they are not in order
13×2 Matrix{Float64}:
 2.0  3.0
 1.0  2.0
 2.0  4.0
 2.0  3.0
 2.0  4.0
 2.0  5.0
 2.0  3.0
 1.0  2.0
 1.0  2.0
 1.0  1.0
 1.0  1.0
 2.0  4.0
 1.0  1.0

julia> x = Dict{Float64, Vector{Int}}() # this will store indices per group
Dict{Float64, Vector{Int64}}()

julia> for (i, v) in enumerate(beg[:, 2]) # collect the indices
           push!(get!(x, v, Int[]), i)
       end

julia> x
Dict{Float64, Vector{Int64}} with 5 entries:
  5.0 => [6]
  4.0 => [3, 5, 12]
  2.0 => [2, 8, 9]
  3.0 => [1, 4, 7]
  1.0 => [10, 11, 13]

julia> idx = sort!(mapreduce(x -> first(x, 2), vcat, values(x))) # get first two indices per group in ascending order
9-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  8
 10
 11
Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107