I am trying to write a pandas Dataframe to a parquet file that is compatible with a table in Impala but am struggling to find a solution.
My df has 3 columns
code int64
number float
name object
When I create this into a parquet file and load it into impala, the python schema is preserved and it fails. I would like the parquet to save with the following schema:
code int
number decimal(36,18)
name string
I tried this:
env_schema = """
code int
number decimal(36,18)
name string
"""
df.to_parquet(f'path', index=False, schema=env_schema)
but get the following error:
Argument 'schema' has incorrect type (expected pyarrow.lib.Schema, got str)
Does anyone know how I could achieve this? Thanks