Where do i find ParquetDatasetPiece class?

Question

Reading the petastorm/etl/dataset_metadata.py script I found this code

if row_groups_key != ".":
    for row_group in range(row_groups_per_file[row_groups_key]):
        rowgroups.append(pq.ParquetDatasetPiece(
            piece.path,
            open_file_func=dataset.fs.open, 
            row_group=row_group, 
            partition_keys=piece.partition_keys
        ))

where pq is defined like:

from pyarrow import parquet as pq

I've searched everywhere for the ParquetDatasetPiece class and can't find it. Somebody can tell me where is the ParquetDatasetPiece class?

score 1 · Accepted Answer · answered Sep 12 '22 at 06:53

1

You can find it in the parquet part of the pyarrow codebase: https://github.com/apache/arrow/blob/951663a41c183c8fec5a4da9a8f9daf45ed85451/python/pyarrow/parquet/core.py#L1059-L1084

Note: it is being deprecated from pyarrow version 5.0.

answered Sep 12 '22 at 06:53

alenka

346
1
3

I've been reading petastorm's code and this class is in the latest version of this package. Thanks for the link – Omar Puentes Sep 16 '22 at 16:13

Where do i find ParquetDatasetPiece class?

1 Answers1