I am trying to scan a folder of multiple parquet file into a polars dataframe. On this question the following is given as an answer using s3.
from pyarrow.dataset import dataset
import gcsfs
import polars as pl
# setup cloud filesystem access
cloudfs = gcsfs.GCSFileSystem(project="my-project")
# reference multiple parquet files
pyarrow_dataset = dataset(
source = "gs://bucket/path/*.parquet",
filesystem = cloudfs,
format = 'parquet',
)
# load efficiently into polars
ldf = pl.scan_pyarrow_dataset( pyarrow_dataset )
When trying to change this to use gcs filesystem I get the following error:
AttributeError: 'GCSFileSystem' object has no attribute 'schema'
Is it possible to read multiple parquet files directly into a polars dataframe?