1

I am writing a struct array in C# using the following code:

        var structField = new StructType(
            new []
            {
                new Field("field1", new StringType(), nullable: false),
                new Field("field2", new Int64Type(), nullable: false)
            });
        
        StringArray stringArray = new StringArray.Builder()
            .AppendRange(new[] {"angel", "bobby", "charlie"})
            .Build();
        Int64Array intArray = new Int64Array.Builder()
            .AppendRange(new[] { 1L,2,3 })
            .Build();
        
        StructArray structs = new StructArray(structField, 3, new IArrowArray[] { stringArray, intArray }, ArrowBuffer.Empty, nullCount: 0);

        var recordBatch = new Apache.Arrow.RecordBatch.Builder()
            .Append("col1", false, structs)
            .Build();

        using (var stream = File.OpenWrite("test.arrow"))
        using (var writer = new Apache.Arrow.Ipc.ArrowFileWriter(stream, recordBatch.Schema, true))
        {
            await writer.WriteRecordBatchAsync(recordBatch);
            await writer.WriteEndAsync();
        }

But when I try to read it in julia I get an error:

using Arrow
table = Arrow.Table("test.arrow")
@show table
@show table[1]

ERROR: LoadError: MethodError: no method matching iterate(::Nothing)
Closest candidates are:
  iterate(::Union{LinRange, StepRangeLen}) at range.jl:872
  iterate(::Union{LinRange, StepRangeLen}, ::Integer) at range.jl:872
  iterate(::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}} at dict.jl:712
  ...
Stacktrace:
 [1] getdictionaries!(dictencoded::Dict{Int64, Arrow.Flatbuf.Field}, field::Arrow.Flatbuf.Field)
   @ Arrow ~/.julia/packages/Arrow/P0wVk/src/table.jl:409
 [2] getdictionaries!(dictencoded::Dict{Int64, Arrow.Flatbuf.Field}, field::Arrow.Flatbuf.Field)
   @ Arrow ~/.julia/packages/Arrow/P0wVk/src/table.jl:410
 [3] macro expansion
   @ ~/.julia/packages/Arrow/P0wVk/src/table.jl:339 [inlined]
 [4] macro expansion
   @ ./task.jl:454 [inlined]
 [5] Arrow.Table(blobs::Vector{Arrow.ArrowBlob}; convert::Bool)
   @ Arrow ~/.julia/packages/Arrow/P0wVk/src/table.jl:321
 [6] Table
   @ ~/.julia/packages/Arrow/P0wVk/src/table.jl:295 [inlined]
 [7] #Table#98
   @ ~/.julia/packages/Arrow/P0wVk/src/table.jl:290 [inlined]
 [8] Table (repeats 2 times)
   @ ~/.julia/packages/Arrow/P0wVk/src/table.jl:290 [inlined]
 [9] top-level scope

Similar error for:

stream = Arrow.Stream("test.arrow")
for d in stream
    @show d
end

I've also tested it with Python:

table = feather.read_table("test.arrow")
print(table[0])

OSError: Verification of flatbuffer-encoded Footer failed.

So it seems to be an issue with the footer maybe not being written in C#.

BAR
  • 15,909
  • 27
  • 97
  • 185
  • It looks like you're mixing up three formats (Arrow IPC, "Table", and Feather") in your above. Your C# code looks good but I think you'll want your Julia code to read your IPC file in using Stream, not Table. The docs say you can read an Arrow IPC file batch-by-batch into Tables. I'm not sure (without testing myself) whether your Python code should work or not. – amoeba Mar 30 '23 at 00:18
  • @amoeba I tried with stream and file readers in python and i got likely footer errors. Seems like c# isn't writing them. – BAR Mar 30 '23 at 01:36
  • This looks like a bug in arrow-julia. You might follow up by filing an issue at https://github.com/apache/arrow-julia/issues. If I keep my file types straight (Arrow IPC File vs Arrow IPC Stream) and use the appropriate readers and writers (In C# `Apache.Arrow.Ipc.ArrowStreamWriter `to write an Arrow IPC Stream file and `Apache.Arrow.Ipc.ArrowFileWriter` to write an Arrow IPC file file, Python can read either. Julia can only read an Arrow IPC Stream (`Arrow.Stream()`) file without error and not an Arrow IPC File (`Arrow.Table()`). I get the same error as you. – amoeba Apr 01 '23 at 01:08
  • PS: I published my code and output/errors to https://gist.github.com/amoeba/1883b2823fe597a5921a20d9af7baa47 – amoeba Apr 01 '23 at 01:16

0 Answers0