1

Having a dataframe df with columns :a and :b, how can I get all elements in column :a that are in a row with e.g. b = 0.5? Can this be done with DataFrames alone or is a meta package needed?

loki
  • 142
  • 1
  • 9

1 Answers1

3
df[df.b .== 5, :]

Example

julia> df = DataFrame(a=11:17, b=vcat([5,5],1:5))
7×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 11    │ 5     │
│ 2   │ 12    │ 5     │
│ 3   │ 13    │ 1     │
│ 4   │ 14    │ 2     │
│ 5   │ 15    │ 3     │
│ 6   │ 16    │ 4     │
│ 7   │ 17    │ 5     │

julia> df[df.b .== 5, :]
3×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 11    │ 5     │
│ 2   │ 12    │ 5     │
│ 3   │ 17    │ 5     │

If you want just the column a:

julia> df[df.b .== 5, :].a
3-element Array{Int64,1}:
 11
 12
 17

Yet another option is to use filter with a lambda function (this is slightly faster and uses less memory):

julia> filter(row -> row[:b] == 5, df)
3×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 11    │ 5     │
│ 2   │ 12    │ 5     │
│ 3   │ 17    │ 5     │
Przemyslaw Szufel
  • 40,002
  • 3
  • 32
  • 62
  • Nice answer. Could you please clarify the comment about ``filter`` being slightly faster? In a related answer, Bogumił Kamiński stated something a bit different (perhaps the context is different): https://stackoverflow.com/questions/58220143/julia-dataframe-select-rows-based-values-of-one-column-belonging-to-a-set – PatrickT Jun 03 '22 at 03:00
  • 1
    My answer is without external packages. There are many ways to do the same thing in any language. – Przemyslaw Szufel Jun 03 '22 at 10:18