I am testing polars
implementation in R and I want to read and process (lazily) a parquet file from a public AWS S3 bucket.
After loading
library(aws.s3)
library(arrow)
library(polars)
I tried to open a connection using arrow::s3_bucket(url)
where url
is the location of the parquet file: s3://rimrep-data-public/091-aims-sst/test-50-64-spatialpart
tempBucket <- s3_bucket(bucket = 's3://rimrep-data-public/091-aims-sst/test-50-64-spatialpart')
df <- scan_parquet(tempBucket)
resulted in
Error in new_from_parquet(path = file, n_rows = n_rows, cache = cache, : not a string object
Any suggestions will be really appreciated.
Cheers, Eduardo
UPDATE: I tried to read using the S3 URI directly:
df <- scan_parquet('s3://rimrep-data-public/091-aims-sst/test-50-64-spatialpart')
and get:
thread '<unnamed>' panicked at 'One or more of the cloud storage features ('aws', 'gcp', ...) must be enabled.'