I have a dataset that looks like this:
I am taking a CSV file, converting it to Parquet and then sending it to Arrow. There is a reason why I am doing it like this. My goal is to get access to the information in row "Algeria"
. This is my code:
df = CSV.read("temp.csv", DataFrame)
write_parquet("data_file.parquet", df)
df = DataFrame(read_parquet("data_file.parquet"))
Arrow.write("data_file.arrow", df)
df = DataFrame(Arrow.Table("data_file.arrow"))
dates = names(df)[5:end]
countries = unique(df[:, :"Country/Region"])
algeria = df[df."Country/Region" .== "Algeria", 4:end]
# Print(sum(eachcol(algeria)))
Print(Statistics.mean(eachcol(algeria)))
But the last part, which tries to retrieve the data from Arrow, throws this error:
MethodError: no method matching +(::Float64, ::String)
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:538
+(::Float64, !Matched::Float64) at float.jl:401
+(!Matched::ChainRulesCore.One, ::Any) at /home/onur/.julia/packages/ChainRulesCore/7d1hl/src/differential_arithmetic.jl:94
What am I doing wrong?
This is what I get when I type in "Algeria" to the REPL
Update: Implementation of Gabriel's suggestion:
begin
algeria = df[df."Country/Region" .== "Algeria", 4:end]
for i = 1:size(algeria, 2)
if eltype(algeria[!, i]) == String
algeria[!, i] = parse.(Float64, algeria[!, i])
end
end
Statistics.mean(eachcol(algeria))
end
This is the error:
MethodError: no method matching +(::Float64, ::String)
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:538
+(::Float64, !Matched::Float64) at float.jl:401
+(!Matched::ChainRulesCore.One, ::Any) at /home/onur/.julia/packages/ChainRulesCore/7d1hl/src/differential_arithmetic.jl:94