A Python interface to the Parquet file format.
Questions tagged [fastparquet]
141 questions
0
votes
0 answers
Error reported when saving dataframe with date as object into a parquet file
I'm trying to save a dataframe with Date as a column as parquet file. The Date series is in the type of object.
On one computer, there is no problems (pandas version is 1.4.2, fastparquet version is 0.7.2). However, when I tried it on another…

Harry
- 331
- 1
- 4
- 14
0
votes
0 answers
Insert data into a snowflake table from sqlachemy
So I am trying to insert data into a snowflake transient table (from a parquet file), but my syntax doesn't allow me to go past our SAST test in the pipeline.
Do you see anything wrong with the following code snippet (especially the insert into step…

Salahddin-Warid
- 105
- 1
- 1
- 5
0
votes
1 answer
Writing Pandas DataFrame to Parquet file?
I am reading data in chunks using pandas.read_sql and appending to parquet file
but get errors
Using pyarrow.parquet:
import pyarrow as pa
import pyarrow.parquet as pq
for chunk in pd.read_sql_query(query , conn, chunksize=10000):
table_data =…

Asad Khalil
- 45
- 1
- 10
0
votes
1 answer
How to read parquet file partitioned by date folder to dataframe from s3 using python?
Using python, I should go till cwp folder and get into the date folder and read the parquet file.
I have this folder structure inside s3.
Sample s3 path:
bucket name = lla.analytics.dev
path =…

Pavithra Kannan
- 35
- 8
0
votes
0 answers
How does one write to a parquet file only using fastparquet and in chunks?
The in chunks is probably the hard part.
I need to write each small chunk, then upload to S3 and loop that.
I can't use AWS Wrangler, S3FS, or Pyarrow 2.0.0+ due to constraints of the codebase (most of my experiments fail as there are issues with…

caasswa
- 501
- 3
- 10
0
votes
1 answer
How to query on parquet files using pyarrow
I have a parquet file with 35 columns nd i have to check if a specific value is present in a column or not using pyarrow.does anyone know how to do that?

Barkha C
- 1
- 1
0
votes
1 answer
RuntimeError: Decompression 'SNAPPY' not available. Options: ['BROTLI', 'GZIP', 'UNCOMPRESSED'] (error happens only in .py and not in .ipython)
I got this error as in the title while trying to read parquet files using fastparquet with the following codes:
from fastparquet import ParquetFile
pf = ParquetFile(myfile.parquet)
df = pf.to_pandas()
I tried the solutions suggested from this post,…

Lu W
- 21
- 5
0
votes
2 answers
Conda package bug? binary incompatability
I'm working in a remote Jupyter notebook on a system where I don't have root access, or even a shell in which to make many adjustments. I can retrieve packages from Conda's archive and run functions in notebook cells that install packages like…

pauljohn32
- 2,079
- 21
- 28
0
votes
1 answer
Error installing fastparquet in windows 10
I am trying to install fastparquet in Anaconda on Windows 10. I tried fixing the expected errors by installing Visual Studio Build Tools by following this question
Steps taken when installing Build Tools:
Visual C++ Build tools core features.
VC++…

Murtaza Haji
- 1,093
- 1
- 13
- 32
0
votes
0 answers
python import fastparquet got "double free or corruption (top)" error
When I run import fastparquet I got error
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fastparquet
double free or corruption…

user15964
- 2,507
- 2
- 31
- 57
0
votes
2 answers
Is there a way to incrementally update Dask metadata file?
I'm trying to process a dataset and make incremental updates as writing it out in Dask. The Dask metadata file would help a lot when it comes to rereading the processed data. However, as I write new partitions/subsets to the same path, the metadata…

Shi Fan
- 1
- 2
0
votes
1 answer
Can get correct statistics from fastparquet
I am getting None statistics (min / max) when reading file from S3 using fastparquet.
When calling
fp.ParquetFile(fn=path, open_with=myopen).statistics['min']
Most of the values are None, and some of the values are valid.
However, when I read the…

LeonBam
- 145
- 1
- 12
0
votes
1 answer
Convert multiple CSVs to single partitioned parquet dataset
I have a set of CSV files, each for one year of data, with YEAR column in each. I want to convert them into single parquet dataset, partitioned by year, for later use in pandas. The problem is that dataframe with all years combined is too large to…

Anton Babkin
- 595
- 1
- 8
- 12
0
votes
1 answer
InvalidIndexError error mapping dask series
This mapping works when calling head on the first 100 rows:
ddf['val'] = ddf['myid'].map( val['val'] , meta=pd.Series(float) )
But when I try to save to parquet:
ddf.to_parquet('myfile.parquet',
compression='snappy',
…

scottlittle
- 18,866
- 8
- 51
- 70
0
votes
0 answers
How to install pyarrow , fastparquet offline?
I want to install pyarrow, fastparquet offline. I have network issue to download python packages using pip, so trying to download pyarrow from pypi.org/project/pyarrow/#files and install it but i'm getting error…

user2848031
- 187
- 12
- 36
- 69