I am using python version of the polars
library to read a parquet file with large no of rows . Here is the link to the library - https://github.com/pola-rs/polars
I am trying to read a parquet file from Azure storage account using the read_parquet
method . I can see there is a storage_options
argument which can be used to specify how to connect to the data storage.Here is the definition of the of read_parquet
method -
def read_parquet(
source: str | Path | BinaryIO | BytesIO | bytes,
columns: list[int] | list[str] | None = None,
n_rows: int | None = None,
use_pyarrow: bool = False,
memory_map: bool = True,
storage_options: dict[str, object] | None = None,
parallel: ParallelStrategy = "auto",
row_count_name: str | None = None,
row_count_offset: int = 0,
low_memory: bool = False,
pyarrow_options: dict[str, object] | None = None,
) -> DataFrame:
Can anyone let me know what values do I need to provide as part of the storage_options to connect to the Azure storage account if I am using a system assigned managed identity. Unfortunately I could not find any example for this . Most of the examples are using connection string and access keys and due to security reasons I cannot use them.
edit : I just came to know that the storage_options are passed to another library called ffspec
. But I have no idea about it.