I'm having some trouble making sense of Dask's to_parquet
method and why it has a schema
argument. When I have a Dask DataFrame variable named ddf
and access ddf.dtypes
, I can see the Data Types of each column, meaning that Dask does know the dtype of each column, right? If that's the case, then why does it need to infer the data schema when I do ddf.to_parquet
?
I'm asking this because according to Dask's documentation (https://docs.dask.org/en/stable/generated/dask.dataframe.to_parquet.html), the default value for the schema
argument is "infer", which seems counter-intuitive to me.
I know this is a basic question, but I'm a Dask newbie trying to make sense of the tool.
Thanks.