2

My dataframe has 3 columns A, B and C and for each row only one of these columns contains a value.

I want a MERGE column that contains the values from A or B or C

using DataFrames

df = DataFrame(NAME = ["a", "b", "c"], A = [1, missing, missing], B = [missing, 2, missing], C = [missing, missing, 3])

3×4 DataFrame
│ Row │ NAME   │ A       │ B       │ C       │
│     │ String │ Int64?  │ Int64?  │ Int64?  │
├─────┼────────┼─────────┼─────────┼─────────┤
│ 1   │ a      │ 1       │ missing │ missing │
│ 2   │ b      │ missing │ 2       │ missing │
│ 3   │ c      │ missing │ missing │ 3       │

How the best julia way to get the MERGE column?

3×5 DataFrame
│ Row │ NAME   │ A       │ B       │ C       │ MERGE │
│     │ String │ Int64?  │ Int64?  │ Int64?  │ Int64 │
├─────┼────────┼─────────┼─────────┼─────────┼───────┤
│ 1   │ a      │ 1       │ missing │ missing │ 1     │
│ 2   │ b      │ missing │ 2       │ missing │ 2     │
│ 3   │ c      │ missing │ missing │ 3       │ 3     │

What I was able to work out so far is:

select(df, :, [:A, :B, :C] => ByRow((a,b,c) -> sum(skipmissing([a, b, c]))) => :MERGE)

What about a scenario for which there is a variable range of columns?

select(df, range => ??? => :MERGE)
attdona
  • 17,196
  • 7
  • 49
  • 60

1 Answers1

2

You can write it like this:

julia> transform!(df, [:A, :B, :C] => ByRow(coalesce) => :MERGE)
3×5 DataFrame
│ Row │ NAME   │ A       │ B       │ C       │ MERGE │
│     │ String │ Int64?  │ Int64?  │ Int64?  │ Int64 │
├─────┼────────┼─────────┼─────────┼─────────┼───────┤
│ 1   │ a      │ 1       │ missing │ missing │ 1     │
│ 2   │ b      │ missing │ 2       │ missing │ 2     │
│ 3   │ c      │ missing │ missing │ 3       │ 3     │

Instead of [:A, :B, :C] you can put any selector, like All(), Between(:A, :C), 1:3 etc.

Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107
  • Many Thanks! Are the list of selectors documented somewhere? – attdona Sep 23 '20 at 12:23
  • 1
    For multi-column selectors it is stored in an unexported tuple `DataFrames.MULTICOLUMNINDEX_TUPLE` which holds currently: `(:AbstractVector, :Regex, :Not, :Between, :All, :Colon)`. – Bogumił Kamiński Sep 23 '20 at 13:13