2

I'm very new to Julia Programming.

I have a folder of CSV files (14) that I join to one big data frame and I'm trying to save the big CSV. (rows - 262673020 x columns - 77) when I use CSV.write - I get this ERROR: BoundsError: attempt to access 4194304-element Array{UInt8,1} at index [1:4194305].

So I tried to save it into a feather file but I getting this ERROR: InexactError: trunc(Int32, 2147483662) - This error looks to be reaching some 32 max. but not sure why

I'm not sure what is going on just need some help understand what to do.

Package version - Julia Version 1.5.2, - Glob v1.3.0, - CSV v0.5.23, - Tables v0.2.11, - Feather v0.5.4

Update to Package - Julia Version 1.5.2 - CSV 0.7.7 - DataFrames v0.21.8 - Glob v1.3.0 - Tables v1.1.0 - Feather v0.5.6 -

using Glob, CSV, Tables, Feather

fileDirectory = "location/CSV"
files = glob("*.csv", fileDirectory)

list_df = [DataFrame(CSV.read(f)) for f in files]
Join_DF = join(list_df[3], list_df[4], list_df[5], list_df[6], list_df[7], list_df[8], list_df[9], list_df[10], list_df[11], list_df[12], list_df[13], list_df[14], on = :INC_KEY, kind = :outer)


Feather.write("location/join_files.feather", Join_DF) 
# ERROR: InexactError: trunc(Int32, 2147483662)

CSV.write("location/join_files.csv", Join_DF) 
# ERROR: BoundsError: attempt to access 4194304-element Array{UInt8,1} at index [1:4194305].
CSV - 
Stacktrace:
 [1] throw_boundserror(::Array{UInt8,1}, ::Tuple{UnitRange{Int64}}) at ./abstractarray.jl:541
 [2] checkbounds at ./abstractarray.jl:506 [inlined]
 [3] view at ./subarray.jl:158 [inlined]
 [4] writecell(::Array{UInt8,1}, ::Int64, ::Int64, ::IOStream, ::Int64, ::CSV.Options{UInt8,UInt8,Nothing,Tuple{}}) at /Users/.julia/packages/CSV/4GOjG/src/write.jl:147
 [5] #64 at /Users/.julia/packages/CSV/4GOjG/src/write.jl:182 [inlined]
 [6] macro expansion at /Users/.julia/packages/Tables/FXXeK/src/utils.jl:54 [inlined]
 [7] eachcolumn at /Users/.julia/packages/Tables/FXXeK/src/utils.jl:48 [inlined]
[8] writerow(::Array{UInt8,1}, ::Base.RefValue{Int64}, ::Int64, ::IOStream, ::Tables.Schema{(
[9] #55 at /Users/.julia/packages/CSV/4GOjG/src/write.jl:80 [inlined]
[10] (::CSV.var"#62#63"{CSV.var"#55#56"{Bool,Tables.Schema{(
[12] open(::Function, ::String, ::String) at ./io.jl:323
 [13] with at /Users/.julia/packages/CSV/4GOjG/src/write.jl:139 [inlined]
 [14] #write#54 at /Users/.julia/packages/CSV/4GOjG/src/write.jl:73 [inlined]
 [15] write(::Tables.Schema{(
 [16] write(::String, ::DataFrame; delim::Char, quotechar::Char, openquotechar::Nothing, closequotechar::Nothing, escapechar::Char, newline::Char, decimal::Char, dateformat::Nothing, quotestrings::Bool, missingstring::String, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/.julia/packages/CSV/4GOjG/src/write.jl:60
 [17] write(::String, ::DataFrame) at /Users/.julia/packages/CSV/4GOjG/src/write.jl:53
 [18] top-level scope at none:1

Feather -
ERROR: InexactError: trunc(Int32, 2147483662)
Stacktrace:
 [1] throw_inexacterror(::Symbol, ::Type{Int32}, ::Int64) at ./boot.jl:558
 [2] checked_trunc_sint at ./boot.jl:580 [inlined]
 [3] toInt32 at ./boot.jl:617 [inlined]
 [4] Int32 at ./boot.jl:707 [inlined]
 [5] convert at ./number.jl:7 [inlined]
 [6] setindex! at ./array.jl:847 [inlined]
 [7] offsets(::Type{Int32}, ::Type{UInt8}, ::PooledArrays.PooledArray{Union{Missing, String},UInt32,1,Array{UInt32,1}}) at /Users/.julia/packages/Arrow/q3tEJ/src/lists.jl:300
 [8] Arrow.NullableList{String,Int32,P} where P<:Arrow.AbstractPrimitive(::Type{UInt8}, ::PooledArrays.PooledArray{Union{Missing, String},UInt32,1,Array{UInt32,1}}) at /Users/.julia/packages/Arrow/q3tEJ/src/lists.jl:243
 [9] NullableList at /Users/.julia/packages/Arrow/q3tEJ/src/lists.jl:251 [inlined]
 [10] arrowformat at /Users/.julia/packages/Arrow/q3tEJ/src/arrowvectors.jl:242 [inlined]
 [11] getarrow(::PooledArrays.PooledArray{Union{Missing, String},UInt32,1,Array{UInt32,1}}) at /Users/.julia/packages/Feather/y64Pt/src/sink.jl:40
 [12] write(::IOStream, ::DataFrame; description::String, metadata::String) at /Users/.julia/packages/Feather/y64Pt/src/sink.jl:18
 [13] #20 at /Users/.julia/packages/Feather/y64Pt/src/sink.jl:32 [inlined]
 [14] open(::Feather.var"#20#21"{String,String,DataFrame}, ::String, ::Vararg{String,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at ./io.jl:325
 [15] open at ./io.jl:323 [inlined]
 [16] #write#19 at /Users/.julia/packages/Feather/y64Pt/src/sink.jl:31 [inlined]
 [17] write(::String, ::DataFrame) at /Users/.julia/packages/Feather/y64Pt/src/sink.jl:31
 [18] top-level scope at none:1
orthoeng2
  • 140
  • 1
  • 6
  • 2
    You are using deprecated functions. Can you please pass exact versions of packages you are using? Also can you please check that `Join_DF` is displayed without an error when you try to just show it in the REPL? – Bogumił Kamiński Oct 28 '20 at 16:54
  • Thank you Bogumil, I add the version to the main .... I'm not sure if the Join_DF error It's not showing up on my end. What do you think is best for me to do so the error does not show up? – orthoeng2 Oct 28 '20 at 17:36
  • 1
    CSV 0.5.23 is a very old version of CSV.jl. The current version is 0.7.7. This is probably the source of the problem. Also you have not reported the version of DataFrames.jl but probably it is also very outdated. The recommendation is to install latest versions of CSV.jl and DataFrames.jl. The easiest way to do it is to use a fresh project environment. See the first bullet in https://bkamins.github.io/julialang/2020/05/18/project-workflow.html – Bogumił Kamiński Oct 28 '20 at 17:50
  • Oh wow, Thank you for this I will try it can get back to you – orthoeng2 Oct 28 '20 at 18:02
  • Thank you for the link, I have a question - I made a new project environment and installed CSV 0.7.7 - DataFrames v0.21.8, But I'm getting this Error (using CSV)- ERROR: LoadError: too many parameters for type - / - ERROR: Failed to precompile CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b] to /Users/.julia/compiled/v1.5/CSV/HHBkp_feOl4.ji. – orthoeng2 Oct 29 '20 at 13:05
  • 1
    Can you post the full error on https://discourse.julialang.org/ then either I or someone else from the community can have a look at it. – Bogumił Kamiński Oct 29 '20 at 16:45
  • @BogumiłKamiński hello, now that the CSV 0.7.7 problem is fix. I loaded my original code. I'm back to the original error. ERROR: BoundsError: attempt to access 4194304-element Array{UInt8,1} at index [1:4194305] – orthoeng2 Oct 30 '20 at 13:18
  • 1
    Can you run `DataFrames._check_consistency(Join_DF)` and tell me what it produces? – Bogumił Kamiński Oct 30 '20 at 15:12
  • @BogumiłKamiński - save = DataFrames._check_consistency(Join_DF) - (Nothing) – orthoeng2 Oct 30 '20 at 15:18
  • 1
    This means is that the problem is not with DataFrames.jl (which I maintain - the call shows that the `Join_DF` is nor corrupted) but with CSV.jl/Feather.jl. This means again - that it is best to move this question to Discourse to check with @quinnj what is going on. – Bogumił Kamiński Oct 30 '20 at 16:01

0 Answers0