0

I have been struggling with a dataframe loaded in from Feather.jl when I try to do a by

using Feather,DataFrames, DataFramesMeta, CategoricalArrays

a = Feather.read("some_file.feather")
# the below fails
aaa = by(a, :some_col, df -> sum(df[:some_val]))

it gives an error

MethodError: Cannot convert an object of type String to an object of type CategoricalArrays.CategoricalValue{String,Int32}

The type information is as per below

typeof(a)
# DataFrames.DataFrame

typeof(a[:some_col]) 
# CategoricalArrays.NullableCategoricalArray{String,1,Int32}

typeof(a[:some_val])
# NullableArrays.NullableArray{Float64,1}

The documentation for CategoricalArrays doesn't contain a lot of documentation on working with DataFrames (nor should it I guess)

However I tried to replace the column with a test value then the by works.

a[:some_col] = ["Testing" for i in 1:nrow(a)]
#this works
by(a,:some_col, df -> sum(df[:some_val]))

so it must be something wrong with CategoricalArrays. But I can't figure out how to do this simple summary. Please help

xiaodai
  • 14,889
  • 18
  • 76
  • 140
  • To convert from NullableCategoricalArray to DataArray use: `df_from_a = DataFrame(some_col = get.(get.(a[:some_col],Nullable{NAtype}(NA))), some_val = get.(get.(a[:some_val],Nullable{NAtype}(NA))))`. There should be an easier method but this should work. Now use `by(df_from_a, :some_col, df -> sum(df[:some_val]))` – Dan Getz Sep 22 '17 at 11:41
  • Please edit your question, so "edited" code actually parses. See for example missing `)` on `by(a,:some_col, df -> sum(df[:some_val])` in the question. – Dan Getz Sep 22 '17 at 11:43
  • Thanks @DanGetz I've fixed the above and you are right I had to make many changes to protect sensitivity of data. – xiaodai Sep 24 '17 at 23:21
  • @DanGetz The get.(get.(...)) method works but I guess it's not ideal. Ideally the categoricalarrays should just work right of the box https://github.com/JuliaData/CategoricalArrays.jl/issues/80 – xiaodai Sep 24 '17 at 23:22

0 Answers0