38

The DataFrame type in Julia allows you to access it as an array, so it is possible to remove columns via indexing:

df = df[:,[1:2,4:end]] # remove column 3

The problem with this approach is that I often only know the column's name, not its column index in the table.

Is there a built-in way to remove a column by name?

Alternatively, is there a better way to do it than this?

colind = findfirst(names(df), colsymbol)
df = df[:,[1:colind-1,colind+1:end]]

The above is failure prone; there are a few edge-cases (single column, first column, last column, symbol not in table, etc.)

Thank you

Mageek
  • 4,691
  • 3
  • 26
  • 42

3 Answers3

51

You can use select!:

julia> df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"], C = 2:5)
4x3 DataFrame
|-------|---|-----|---|
| Row # | A | B   | C |
| 1     | 1 | "M" | 2 |
| 2     | 2 | "F" | 3 |
| 3     | 3 | "F" | 4 |
| 4     | 4 | "M" | 5 |

julia> select!(df, Not(:B))
4x2 DataFrame
|-------|---|---|
| Row # | A | C |
| 1     | 1 | 2 |
| 2     | 2 | 3 |
| 3     | 3 | 4 |
| 4     | 4 | 5 |

For more general ops, remember that you can pass an array of Symbols or a bool array too, and so arbitrarily complicated selections like

julia> df[~[(x in [:B, :C]) for x in names(df)]]
4x1 DataFrame
|-------|---|
| Row # | A |
| 1     | 1 |
| 2     | 2 |
| 3     | 3 |
| 4     | 4 |

julia> df[setdiff(names(df), [:C])]
4x1 DataFrame
|-------|---|
| Row # | A |
| 1     | 1 |
| 2     | 2 |
| 3     | 3 |
| 4     | 4 |

will also work.

xiaodai
  • 14,889
  • 18
  • 76
  • 140
DSM
  • 342,061
  • 65
  • 592
  • 494
  • 1
    Just to make a note of it, this example doesn't work on Julia 0.21.0 / DataFrames 1.3.1. @LyxUser12345 's answer using `select!` does in fact work. – quantif Jun 06 '20 at 19:14
  • 2
    please modify the answer as it's no longer current. – xiaodai Jun 13 '20 at 13:03
14

Since delete! throws a deprecation warning that suggests using select!:

julia> d = DataFrame(a=1:3, b=4:6)
3×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 4     │
│ 2   │ 2     │ 5     │
│ 3   │ 3     │ 6     │

julia> select!(d, Not(:a))
3×1 DataFrame
│ Row │ b     │
│     │ Int64 │
├─────┼───────┤
│ 1   │ 4     │
│ 2   │ 5     │
│ 3   │ 6     │
LyxUser12345
  • 401
  • 4
  • 12
4

As of Julia 1.0, you'll want to use deletecols!:

https://juliadata.github.io/DataFrames.jl/stable/lib/functions.html#DataFrames.deletecols!

julia> d = DataFrame(a=1:3, b=4:6)
3×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 4     │
│ 2   │ 2     │ 5     │
│ 3   │ 3     │ 6     │

julia> deletecols!(d, 1)
3×1 DataFrame
│ Row │ b     │
│     │ Int64 │
├─────┼───────┤
│ 1   │ 4     │
│ 2   │ 5     │
│ 3   │ 6     │
David J.
  • 31,569
  • 22
  • 122
  • 174
  • 4
    `deletecols!` has been depreciated as of Julia 1.3.1, @LyxUser12345 's answer using `select!` does work. – quantif Jun 06 '20 at 19:15
  • 4
    ..still to me a "delete" keyname for a "delete" operation seems more direct than a "select(not())" one :-/ – Antonello Dec 14 '20 at 15:42