Highest Voted 'fastparquet' Questions

1

vote

0 answers

while creating a parquet file using dask (fastparquet) with an append option the first partitions file were missing from the folder

When we create a parquet file with append option the first partition file of the parquet is missing from the final result. Any one know the reason. we are using Dask 2.30. And this happens only in one environment but in another completely different …

asked Jan 21 '22 at 15:35

Arun

41
1
4

1

vote

1 answer

fastparquet export for Redshift

I had a very simple idea: Use Python Pandas (for convenience) to do some simple database operations with moderate data amounts and write the data back to S3 in Parquet format. Then, the data should be exposed to Redshift as an external table in…

python pandas amazon-redshift pyarrow fastparquet

asked Sep 17 '21 at 06:30

Werner

95
11

1

vote

2 answers

How can one append to parquet files and how does it affect partitioning?

Does parquet allow appending to a parquet file periodically ? How does appending relate to partitioning if any ? For example if i was able to identify a column that had low cardinality and partition it by that column, if i were to append more data…

parquet pyarrow fastparquet

asked Sep 09 '21 at 20:23

Abhishek Malik

305
4
14

1

vote

1 answer

Split parquet from s3 into chunks

I'm using the following code to read parquet files from s3. Next, I want to iterate over it in chunks. How can I achieve it? import s3fs import fastparquet as fp s3 = s3fs.S3FileSystem() fs = s3fs.core.S3FileSystem() bucket, path = 'mybucket',…

python amazon-web-services amazon-s3 parquet fastparquet

asked Sep 05 '21 at 13:09

ProgramSpree

372
5
21

1

vote

1 answer

Read parquet file using pd.read_parquet looking for a schema

I'm working on an app that is writing parquet files. For testing purposes, I'm trying to read a generated file with pd.read_parquet. I get a really strange error that asks for a schema: self = <[AttributeError("'ParquetFile' object has no attribute…

pandas unit-testing parquet fastparquet

asked Aug 24 '21 at 18:46

Alex

389
4
21

1

vote

1 answer

How to read a 30G parquet file by python

I am trying to read data from a large parquet file of 30G. My memory do not support default reading with fastparquet in python, so I do not know what I should do to lower the memory usage of the reading process.

python-3.x parquet fastparquet

asked Aug 12 '21 at 12:41

Kehan Chen

11
1

1

vote

1 answer

Reading index based range from Parquet File using Python

I'm trying to read a range of data (say row 1000 to 5000) from a parquet file. I've tried pandas with fastparquet engine and even pyarraw but can't seem to find any option to do so. Is there any way to achieve this?

python pandas parquet pyarrow fastparquet

asked Nov 22 '20 at 13:19

MetalMonkey

17
1
9

1

vote

1 answer

Is it possible to read a Parquet dataset partitioned by hand using Dask with the Fastparquet reader?

I created a Parquet dataset partitioned as follows: 2019-taxi-trips/ - month=1/ - data.parquet - month=2/ - data.parquet ... - month=12/ - data.parquet This organization follows the Parquet dataset…

python amazon-s3 dask parquet fastparquet

asked Oct 07 '20 at 19:27

Aleksey Bilogur

3,686
3
30
57

1

vote

1 answer

How to read nested struct Parquet files in Python?

I have a parquet file which contains list of structs and I cannot seem to read it with any of the available python parquet libraries. Some of them return an error noting that 'list of structs' is not yet supported and the others just make a pandas…

python parquet pyarrow fastparquet

asked Aug 26 '20 at 17:42

Nilan Saha

191
1
9

1

vote

1 answer

Reading partitioned Parquet files to DataFame in Python (in memory) where a column type is array of array

Context I have partitioned Parquet files in S3. I want to read and concatenate them into a DataFrame so I can query and view the data (in memory). I did it so far, however one of the columns's data with the type (array>) is converted…

python pandas dataframe parquet fastparquet

asked Aug 24 '20 at 09:42

Mahshid Zeinaly

3,590
6
25
32

1

vote

0 answers

'S3File' object has no attribute 'forced'

Trying to append a parquet file in S3 using fastparquet lib, getting below error: File "/Users/baluinfo/PycharmProjects/untitled/rough.py", line 55, in write(parqKey, ws1 ,write_index=False,append=True,compression='GZIP', open_with=myopen) …

python amazon-s3 fastparquet

asked May 14 '20 at 01:39

Bala

51
1
2

1

vote

0 answers

Cannot install fastparquet in jupyter notebook

I am trying to install fastparquet in order to write a csv into a parquet file. Using jupyter notebook, python 3, the cell does not show any result after running the Following command: pip install fastparquet I run another simple command and it…

jupyter-notebook parquet fastparquet

asked May 10 '20 at 21:30

jusmin

7
2

1

vote

1 answer

Moving data from a database to Azure blob storage

I'm able to use dask.dataframe.read_sql_table to read the data e.g. df = dd.read_sql_table(table='TABLE', uri=uri, index_col='field', npartitions=N) What would be the next (best) steps to saving it as a parquet file in Azure blob storage? From my…

azure dask fastparquet intake

asked Mar 19 '20 at 21:31

Ray Bell

1,508
4
18
45

1

vote

0 answers

dask computation got different errors with pyarrow and s3

I was doing some groupby parallel computation with dask using pyarrow to load parquet files from s3. However, the same piece of code may run or fail (with different error messages) with random chances. Same issue happened when using…

parallel-processing dask pyarrow fastparquet

asked Feb 25 '20 at 22:31

zhh210

388
4
12

1

vote

1 answer

compression option in fastparquet is not consistent

According to the project page of fastparquet, fastparquet support various compression methods Optional (compression algorithms; gzip is always available): snappy (aka python-snappy) lzo brotli lz4 zstandard especially zstandard is modern…

python pandas compression fastparquet

asked Jan 23 '20 at 04:40

user15964

2,507
2
31
57

Questions tagged [fastparquet]

Resources:

while creating a parquet file using dask (fastparquet) with an append option the first partitions file were missing from the folder

fastparquet export for Redshift

How can one append to parquet files and how does it affect partitioning?

Split parquet from s3 into chunks

Read parquet file using pd.read_parquet looking for a schema

How to read a 30G parquet file by python

Reading index based range from Parquet File using Python

Is it possible to read a Parquet dataset partitioned by hand using Dask with the Fastparquet reader?

How to read nested struct Parquet files in Python?

Reading partitioned Parquet files to DataFame in Python (in memory) where a column type is array of array

'S3File' object has no attribute 'forced'

Cannot install fastparquet in jupyter notebook

Moving data from a database to Azure blob storage

dask computation got different errors with pyarrow and s3

compression option in fastparquet is not consistent