I am trying to read an arrow file that I wrote as a sequence of record batches in python. For some reason I am only getting the first struct entry. I have verified the files are bigger than one item and of expected size.
with pa.OSFile(input_filepath, 'rb') as source:
with pa.ipc.open_stream(source) as reader:
for batch in reader:
# only one batch here
my_struct_col = batch.column('col1')
field1_values = my_struct_col.flatten()
print(field1_values)
I am writing the file in Julia using:
using Arrow
struct OutputData
name::String
age::Int32
end
writer = open(filePath, "w")
data = OutputData("Alex", 20)
for _ = 1:1000
t = (col1=[data],)
table = Arrow.Table(Arrow.tobuffer(t))
Arrow.write(writer, table)
end
close(writer)
I believe both languages are using the streaming IPC format to file.