Questions tagged [fastparquet]

A Python interface to the Parquet file format.

Resources:

141 questions
0
votes
0 answers

Error reported when saving dataframe with date as object into a parquet file

I'm trying to save a dataframe with Date as a column as parquet file. The Date series is in the type of object. On one computer, there is no problems (pandas version is 1.4.2, fastparquet version is 0.7.2). However, when I tried it on another…
Harry
  • 331
  • 1
  • 4
  • 14
0
votes
0 answers

Insert data into a snowflake table from sqlachemy

So I am trying to insert data into a snowflake transient table (from a parquet file), but my syntax doesn't allow me to go past our SAST test in the pipeline. Do you see anything wrong with the following code snippet (especially the insert into step…
0
votes
1 answer

Writing Pandas DataFrame to Parquet file?

I am reading data in chunks using pandas.read_sql and appending to parquet file but get errors Using pyarrow.parquet: import pyarrow as pa import pyarrow.parquet as pq for chunk in pd.read_sql_query(query , conn, chunksize=10000): table_data =…
Asad Khalil
  • 45
  • 1
  • 10
0
votes
1 answer

How to read parquet file partitioned by date folder to dataframe from s3 using python?

Using python, I should go till cwp folder and get into the date folder and read the parquet file. I have this folder structure inside s3. Sample s3 path: bucket name = lla.analytics.dev path =…
0
votes
0 answers

How does one write to a parquet file only using fastparquet and in chunks?

The in chunks is probably the hard part. I need to write each small chunk, then upload to S3 and loop that. I can't use AWS Wrangler, S3FS, or Pyarrow 2.0.0+ due to constraints of the codebase (most of my experiments fail as there are issues with…
caasswa
  • 501
  • 3
  • 10
0
votes
1 answer

How to query on parquet files using pyarrow

I have a parquet file with 35 columns nd i have to check if a specific value is present in a column or not using pyarrow.does anyone know how to do that?
Barkha C
  • 1
  • 1
0
votes
1 answer

RuntimeError: Decompression 'SNAPPY' not available. Options: ['BROTLI', 'GZIP', 'UNCOMPRESSED'] (error happens only in .py and not in .ipython)

I got this error as in the title while trying to read parquet files using fastparquet with the following codes: from fastparquet import ParquetFile pf = ParquetFile(myfile.parquet) df = pf.to_pandas() I tried the solutions suggested from this post,…
Lu W
  • 21
  • 5
0
votes
2 answers

Conda package bug? binary incompatability

I'm working in a remote Jupyter notebook on a system where I don't have root access, or even a shell in which to make many adjustments. I can retrieve packages from Conda's archive and run functions in notebook cells that install packages like…
pauljohn32
  • 2,079
  • 21
  • 28
0
votes
1 answer

Error installing fastparquet in windows 10

I am trying to install fastparquet in Anaconda on Windows 10. I tried fixing the expected errors by installing Visual Studio Build Tools by following this question Steps taken when installing Build Tools: Visual C++ Build tools core features. VC++…
Murtaza Haji
  • 1,093
  • 1
  • 13
  • 32
0
votes
0 answers

python import fastparquet got "double free or corruption (top)" error

When I run import fastparquet I got error Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import fastparquet double free or corruption…
user15964
  • 2,507
  • 2
  • 31
  • 57
0
votes
2 answers

Is there a way to incrementally update Dask metadata file?

I'm trying to process a dataset and make incremental updates as writing it out in Dask. The Dask metadata file would help a lot when it comes to rereading the processed data. However, as I write new partitions/subsets to the same path, the metadata…
0
votes
1 answer

Can get correct statistics from fastparquet

I am getting None statistics (min / max) when reading file from S3 using fastparquet. When calling fp.ParquetFile(fn=path, open_with=myopen).statistics['min'] Most of the values are None, and some of the values are valid. However, when I read the…
LeonBam
  • 145
  • 1
  • 12
0
votes
1 answer

Convert multiple CSVs to single partitioned parquet dataset

I have a set of CSV files, each for one year of data, with YEAR column in each. I want to convert them into single parquet dataset, partitioned by year, for later use in pandas. The problem is that dataframe with all years combined is too large to…
Anton Babkin
  • 595
  • 1
  • 8
  • 12
0
votes
1 answer

InvalidIndexError error mapping dask series

This mapping works when calling head on the first 100 rows: ddf['val'] = ddf['myid'].map( val['val'] , meta=pd.Series(float) ) But when I try to save to parquet: ddf.to_parquet('myfile.parquet', compression='snappy', …
scottlittle
  • 18,866
  • 8
  • 51
  • 70
0
votes
0 answers

How to install pyarrow , fastparquet offline?

I want to install pyarrow, fastparquet offline. I have network issue to download python packages using pip, so trying to download pyarrow from pypi.org/project/pyarrow/#files and install it but i'm getting error…
user2848031
  • 187
  • 12
  • 36
  • 69