Highest Voted 'fastparquet' Questions

0

votes

0 answers

Error reported when saving dataframe with date as object into a parquet file

I'm trying to save a dataframe with Date as a column as parquet file. The Date series is in the type of object. On one computer, there is no problems (pandas version is 1.4.2, fastparquet version is 0.7.2). However, when I tried it on another…

asked May 26 '22 at 19:00

Harry

331
1
4
14

0

votes

0 answers

Insert data into a snowflake table from sqlachemy

So I am trying to insert data into a snowflake transient table (from a parquet file), but my syntax doesn't allow me to go past our SAST test in the pipeline. Do you see anything wrong with the following code snippet (especially the insert into step…

sql database sqlalchemy snowflake-cloud-data-platform fastparquet

asked Apr 14 '22 at 22:37

Salahddin-Warid

105
1
1
5

0

votes

1 answer

Writing Pandas DataFrame to Parquet file?

I am reading data in chunks using pandas.read_sql and appending to parquet file but get errors Using pyarrow.parquet: import pyarrow as pa import pyarrow.parquet as pq for chunk in pd.read_sql_query(query , conn, chunksize=10000): table_data =…

python pandas parquet pyarrow fastparquet

asked Dec 22 '21 at 13:36

Asad Khalil

45
1
10

0

votes

1 answer

How to read parquet file partitioned by date folder to dataframe from s3 using python?

Using python, I should go till cwp folder and get into the date folder and read the parquet file. I have this folder structure inside s3. Sample s3 path: bucket name = lla.analytics.dev path =…

python pandas dataframe pyarrow fastparquet

asked Dec 07 '21 at 17:41

Pavithra Kannan

35
8

0

votes

0 answers

How does one write to a parquet file only using fastparquet and in chunks?

The in chunks is probably the hard part. I need to write each small chunk, then upload to S3 and loop that. I can't use AWS Wrangler, S3FS, or Pyarrow 2.0.0+ due to constraints of the codebase (most of my experiments fail as there are issues with…

python pandas parquet pyarrow fastparquet

asked Nov 25 '21 at 22:20

caasswa

501
3
10

0

votes

1 answer

How to query on parquet files using pyarrow

I have a parquet file with 35 columns nd i have to check if a specific value is present in a column or not using pyarrow.does anyone know how to do that?

python pandas filter pyarrow fastparquet

asked Oct 19 '21 at 06:36

Barkha C

1
1

0

votes

1 answer

RuntimeError: Decompression 'SNAPPY' not available. Options: ['BROTLI', 'GZIP', 'UNCOMPRESSED'] (error happens only in .py and not in .ipython)

I got this error as in the title while trying to read parquet files using fastparquet with the following codes: from fastparquet import ParquetFile pf = ParquetFile(myfile.parquet) df = pf.to_pandas() I tried the solutions suggested from this post,…

python snappy fastparquet

asked Feb 12 '21 at 18:31

Lu W

21
5

0

votes

2 answers

Conda package bug? binary incompatability

I'm working in a remote Jupyter notebook on a system where I don't have root access, or even a shell in which to make many adjustments. I can retrieve packages from Conda's archive and run functions in notebook cells that install packages like…

numpy python-3.6 conda fastparquet

asked Feb 09 '21 at 18:24

pauljohn32

2,079
21
28

0

votes

1 answer

Error installing fastparquet in windows 10

I am trying to install fastparquet in Anaconda on Windows 10. I tried fixing the expected errors by installing Visual Studio Build Tools by following this question Steps taken when installing Build Tools: Visual C++ Build tools core features. VC++…

python-3.x anaconda fastparquet

asked Nov 10 '20 at 16:50

Murtaza Haji

1,093
1
13
32

0

votes

0 answers

python import fastparquet got "double free or corruption (top)" error

When I run import fastparquet I got error Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import fastparquet double free or corruption…

python glibc fastparquet

asked Oct 15 '20 at 12:16

user15964

2,507
2
31
57

0

votes

2 answers

Is there a way to incrementally update Dask metadata file?

I'm trying to process a dataset and make incremental updates as writing it out in Dask. The Dask metadata file would help a lot when it comes to rereading the processed data. However, as I write new partitions/subsets to the same path, the metadata…

dask dask-distributed fastparquet dask-dataframe

asked Oct 12 '20 at 19:53

Shi Fan

1
2

0

votes

1 answer

Can get correct statistics from fastparquet

I am getting None statistics (min / max) when reading file from S3 using fastparquet. When calling fp.ParquetFile(fn=path, open_with=myopen).statistics['min'] Most of the values are None, and some of the values are valid. However, when I read the…

dask fastparquet

asked Sep 22 '20 at 08:20

LeonBam

145
1
12

0

votes

1 answer

Convert multiple CSVs to single partitioned parquet dataset

I have a set of CSV files, each for one year of data, with YEAR column in each. I want to convert them into single parquet dataset, partitioned by year, for later use in pandas. The problem is that dataframe with all years combined is too large to…

pandas parquet fastparquet

asked Aug 20 '20 at 16:10

Anton Babkin

595
1
8
12

0

votes

1 answer

InvalidIndexError error mapping dask series

This mapping works when calling head on the first 100 rows: ddf['val'] = ddf['myid'].map( val['val'] , meta=pd.Series(float) ) But when I try to save to parquet: ddf.to_parquet('myfile.parquet', compression='snappy', …

python pandas dask pyarrow fastparquet

asked Jul 10 '20 at 14:57

scottlittle

18,866
8
51
70

0

votes

0 answers

How to install pyarrow , fastparquet offline?

I want to install pyarrow, fastparquet offline. I have network issue to download python packages using pip, so trying to download pyarrow from pypi.org/project/pyarrow/#files and install it but i'm getting error…

python-3.x pyarrow fastparquet

asked Jun 13 '20 at 03:42

user2848031

187
12
36
69

Questions tagged [fastparquet]

Resources:

Error reported when saving dataframe with date as object into a parquet file

Insert data into a snowflake table from sqlachemy

Writing Pandas DataFrame to Parquet file?

How to read parquet file partitioned by date folder to dataframe from s3 using python?

How does one write to a parquet file only using fastparquet and in chunks?

How to query on parquet files using pyarrow

RuntimeError: Decompression 'SNAPPY' not available. Options: ['BROTLI', 'GZIP', 'UNCOMPRESSED'] (error happens only in .py and not in .ipython)

Conda package bug? binary incompatability

Error installing fastparquet in windows 10

python import fastparquet got "double free or corruption (top)" error

Is there a way to incrementally update Dask metadata file?

Can get correct statistics from fastparquet

Convert multiple CSVs to single partitioned parquet dataset

InvalidIndexError error mapping dask series

How to install pyarrow , fastparquet offline?