1

I have 3 DataFrames, each containing 3 columns: A, B, and C.

using DataFrames    
common_data = Dict("A" => [1, 2, 3], "B" => [10, 20, 30])
df1 = DataFrame(merge(common_data, Dict("C" => [100, 200, 300])))
df2 = DataFrame(merge(common_data, Dict("C" => [400, 500, 600])))
df3 = DataFrame(merge(common_data, Dict("C" => [700, 800, 900])))

I consider columns A and B as indices and want to perform an inner join on 3 DataFrames based on column C. This should be done only when the values in columns A and B of each DataFrame are the same. The column of the final output should be [A,B,C_df1,C_df2,C_df3]. How can I achieve this?

PrinceZard
  • 25
  • 4
  • Couple of thoughts. 1) "I have a DataFrame with 3 columns", not quite. You have 3 DataFrames each with 3 columns. 2) "the DataFrame of column C", that does not make sense. Maybe you want to reformulate what you're trying to do. 3) `using DataFrames`. – Andre Wildberg Aug 15 '23 at 10:50
  • 1
    You're right, will edit it. – PrinceZard Aug 15 '23 at 11:11

1 Answers1

1

Just do innerjoin:

julia> innerjoin(df1, df2, df3, on=[:A, :B], makeunique=true)
3×5 DataFrame
 Row │ A      B      C      C_1    C_2
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     1     10    100    400    700
   2 │     2     20    200    500    800
   3 │     3     30    300    600    900

it will auto-generate different column names than you want though.

So you can either preprocess your data frames:

julia> innerjoin(rename.([df1, df2, df3], :C .=> [:C_df1, :C_df2, :C_df3])...,
                 on=[:A, :B])
3×5 DataFrame
 Row │ A      B      C_df1  C_df2  C_df3
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     1     10    100    400    700
   2 │     2     20    200    500    800
   3 │     3     30    300    600    900

or post-process the output:

julia> rename(innerjoin(df1, df2, df3, on=[:A, :B], makeunique=true),
              :C => :C_df1, :C_1 => :C_df2, :C_2 => :C_df3)
3×5 DataFrame
 Row │ A      B      C_df1  C_df2  C_df3
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     1     10    100    400    700
   2 │     2     20    200    500    800
   3 │     3     30    300    600    900
Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107