1

I have a DataFrame with Int64 columns:

using DataFrames
df = DataFrame(a=1:3,b=4:6,c=["a","b","c"])

3×2 DataFrame
 Row │ a      b      c
     │ Int64  Int64  String
─────┼──────────────────────
   1 │     1      4    a
   2 │     2      5    b
   3 │     3      6    c

Now, I want to change the column types to Float64. I know that I can do something like...

using DataFramesMeta, Chain

@chain df begin
    @transform!(:a = Float64.(:a),
                :b = Float64.(:b))
end

or

df.a = Float64.(df.a)
df.b = Float64.(df.b)

But how can I change all columns of type Int64 to Float64. Columns of other types should stay as they are.

(As you might guess from the example above I like the combination of Chain and DataFramesMeta, but of course all answers are more than welcome.)

Georgery
  • 7,643
  • 1
  • 19
  • 52

2 Answers2

2

The simplest way to do it is (this updates your original data frame):

df .= Float64.(df)

With transform! you can alternatively do:

transform!(df, All() .=> ByRow(Float64), renamecols=false)

or you can also do:

mapcols!(ByRow(Float64), df)

(sorry - no DataFramesMeta.jl here yet - but things might change in the future)


If you want to change only e.g. columns that have Int type then do:

julia> transform!(df, names(df, Int) .=> ByRow(Float64), renamecols=false)
3×3 DataFrame
 Row │ a        b        c
     │ Float64  Float64  String
─────┼──────────────────────────
   1 │     1.0      4.0  a
   2 │     2.0      5.0  b
   3 │     3.0      6.0  c

or

mapcols(df) do col
    eltype(col) === Int ? Float64.(col) : col
end
Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107
  • Sorry, this is not the answer I'm looking for. I changed the question to make it more clear. My point is to change all the **columns that have a specific type** and leave columns alone which have a different type. – Georgery Mar 31 '22 at 19:57
  • I have added an answer - use `names` as column selector in `transform!` or a more complex `mapcols` call. – Bogumił Kamiński Mar 31 '22 at 20:21
1

Another way (possibly less efficient):

julia> using DataFrames

julia> df = DataFrame(a=1:3,b=4:6,c=["a","b","c"])
3×3 DataFrame
 Row │ a      b      c      
     │ Int64  Int64  String 
─────┼──────────────────────
   1 │     1      4  a
   2 │     2      5  b
   3 │     3      6  c

julia> [df[!,col] = convert(Vector{Float64},df[!,col]) for col in names(df) if eltype(df[!,col]) <: Integer]
2-element Vector{Vector{Float64}}:
 [1.0, 2.0, 3.0]
 [4.0, 5.0, 6.0]

julia> df    
3×3 DataFrame
 Row │ a        b        c      
     │ Float64  Float64  String 
─────┼──────────────────────────
   1 │     1.0      4.0  a
   2 │     2.0      5.0  b
   3 │     3.0      6.0  c

(adapted from

Antonello
  • 6,092
  • 3
  • 31
  • 56