7

Pandas dataframe is heavy weight so I want to avoid that. But I want to construct Pyarrow Table in order to store the data in parquet format.

I search and read the documentation and I try to use the from_array() but it is not working.

field=[pa.field('name',pa.string()),pa.field('age',pa.int64())]
arrays=[pa.array(['Tom']),pa.array([23])]
pa.Table.from_arrays(pa.schema(field),arrays)

the error is: Length of names (1) doesn't match length of arrays (2)

Zichu Lee
  • 107
  • 1
  • 5

1 Answers1

5

See the Table.from_arrays dcumentation here: https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_arrays The first argument it expects are the arrays, not the schema. So you can either do:

In [64]: pa.Table.from_arrays(arrays, schema=pa.schema(field))
Out[64]: 
pyarrow.Table
name: string
age: int64

Or pass the column names instead of the full schema:

In [65]: pa.Table.from_arrays(arrays, names=['name', 'age']) 
Out[65]: 
pyarrow.Table
name: string
age: int64

In the next version of pyarrow (0.14.0), you will also be able to do:

In [51]: pa.Table.from_pydict({'name': pa.array(['Tom']), 'age': pa.array([23])})
Out[51]: 
pyarrow.Table
name: string
age: int64
joris
  • 133,120
  • 36
  • 247
  • 202