I'm writing some datasets to parquet using pyarrow.parquet.write_to_dataset()
.
Now I'm trying to enable the bloom filter when writing (located in the metadata), but I can find no way to do this. I know in Spark you can do something like
spark.sql(“set parquet.filter.bloom.enabled=true”)
spark.sql(“set parquet.filter.columnindex.enabled=false”)
spark.sql(“set parquet.filter.stats.enabled=false”)
as done in this thread.
Is there a way to do this with PyArrow or some other library?
Currently I am writing the dataset with
import pyarrow.parquet as pq
pq.write_to_dataset(table=table,
root_path=output_file,
filesystem=fsys,
schema=schema
)