When I'm running simulations, I like to initialize a big, empty array and fill it up as the simulation iterates through to the end. I do this with something like res = Array(Real,(n_iterations,n_parameters))
. However, it would be nice to have named columns, which I think means using a DataFrame. Yet when I try to do something like res_df = convert(DataFrame,res)
it throws an error. I would like a more concise approach than doing something like res_df = DataFrame(a=Array(Real,N),b=Array(Real,N),c=Array(Real,N),....)
as suggested by the answers to: julia create an empty dataframe and append rows to it
Asked
Active
Viewed 2,030 times
5

Community
- 1
- 1

Will Townes
- 1,787
- 1
- 17
- 21
1 Answers
12
To preallocate a data frame, you must pre-allocate its columns. You can create three columns full of missing
values by simply doing [fill(missing, 10000) for _ in 1:3]
, but that doesn't actually allocate anything at all because those vectors can only hold one value — missing
— and thus they can't be changed to hold other values later. One way to do this is by using to Vector
constructors that can hold either Missing
or Float64
:
julia> DataFrame([Vector{Union{Missing, Float64}}(missing, 10000) for _ in 1:3], [:a, :b, :c])
10000×3 DataFrame
Row │ a b c
│ Float64? Float64? Float64?
───────┼──────────────────────────────
1 │ missing missing missing
2 │ missing missing missing
⋮ │ ⋮ ⋮ ⋮
10000 │ missing missing missing
9997 rows omitted
Note that rather than Real
, this is using the concrete Float64
— this will have significantly better performance.
(this answer was edited to reflect DataFrames v1.0 syntax)

mbauman
- 30,958
- 4
- 88
- 123
-
If you know that all your columns are of the same type and that there will never be unpopulated (`NA`) elements, there may be other data structures that you can use. Take a look at [NamedArrays.jl](https://github.com/davidavdav/NamedArrays.jl), or if you're willing to fly by the seat of your pants and working on the unstable 0.4, you can try my recent work-in-progress [AxisArrays.jl](https://github.com/mbauman/AxisArrays.jl). Both projects aim to more directly augment the built-in `Array` with dimension names and axis metadata, whereas DataFrames uses a collection-of-columns approach. – mbauman Feb 23 '15 at 20:01
-
This method is now deprecated. – Jake Ireland Oct 13 '21 at 10:26