Highest Voted 'fastparquet' Questions

0

votes

1 answer

Dask dataframe read parquet format fails from http

I have been dealing with this problem for a week. I use the command from dask import dataframe as ddf ddf.read_parquet("http://IP:port/webhdfs/v1/user/...") I got invalid parquet magic. However ddf.read_parquet is Ok with "webhdfs://" I would…

asked May 31 '20 at 15:15

Yousef Oleyaeimotlagh

3
3

0

votes

1 answer

Is it possible to store a parquet file on disk, while appending, and also retrieving rows by index?

I have 185 files of data, which contains a total number of 30 million rows. Each two has two columns; a single int which I want to use as an index, and a list of 512 ints. So it looks something like this IndexID Ids 1899317 [0, 47715, 1757, 9,…

python pandas parquet fastparquet

asked May 10 '20 at 07:15

SantoshGupta7

5,607
14
58
116

0

votes

1 answer

dask: read parquet from Azure blob - AzureHttpError

I created a parquet file in an Azure blob using dask.dataframe.to_parquet (Moving data from a database to Azure blob storage). I would now like to read that file. I'm doing: STORAGE_OPTIONS={'account_name': 'ACCOUNT_NAME', …

azure azure-blob-storage dask fastparquet dask-dataframe

asked Apr 15 '20 at 02:59

Ray Bell

1,508
4
18
45

0

votes

1 answer

Dask not recovering partitions from simple (non-Hive) Parquet files

I have a two-part question about Dask+Parquet. I am trying to run queries on a dask dataframe created from a partitioned Parquet file as so: import pandas as pd import dask.dataframe as dd import fastparquet ##### Generate random data to Simulate…

pandas dask parquet fastparquet dask-dataframe

asked Apr 07 '20 at 00:52

hda 2017

59
6

0

votes

2 answers

Loading parquet file to Redshift

I am trying to save dataframes to parquet and then load them into redshift. For that i do the following: parquet_buffer =…

python pandas amazon-redshift parquet fastparquet

asked Dec 19 '19 at 13:44

FrankyBravo

438
1
4
12

0

votes

1 answer

Google bigquery - Error message 'DataFrame' object has no attribute 'to_parquet' whereas pyarrow and fastparquet are installed

I'm trying to use the Google bigquery function load_table_from_dataframe but I get an error message stating that DataFrame object has no attribute to_parquet. I have installed both pyarrow and fastparquet but still getting the same error…

dataframe google-bigquery pyarrow fastparquet

asked Sep 27 '19 at 03:58

CharlotteB

1
1
2

0

votes

0 answers

How to persist kdb tables to compressed parquet?

I'm trying to store/persist kdb tables in compressed apache parquet format. My initial plan is basically to use embedPy to convert either fastparquet or pyarrow.parquet to be usable from within q. I'll then use the kdb+ tick architecture to process…

parquet kdb pyarrow fastparquet

asked Sep 03 '19 at 22:30

Natalie Williams

355
1
3
9

0

votes

0 answers

Symbol not found: _PyClass_Type

I'm to run some tests from fastparquet using pyCharm on macOS Sierra (10.12.6) but keep failing on: ImportError: dlopen(/Users/dhaviv/Documents/GitHub/fastparquet/fastparquet/speedups.so, 2): Symbol not found: _PyClass_Type I've installed…

python macos fastparquet

asked Jul 30 '19 at 13:26

Daniel Haviv

1,036
8
16

0

votes

1 answer

Converting NaN floats to other types in Parquet format

I currently am processing a bunch of CSV files and transforming them into Parquet. I use these with Hive and query the files directly. I would like to switch over to Dask for my data processing. My data I am reading has optional columns some of…

pandas dask pyarrow fastparquet

asked Jul 25 '19 at 20:33

Eumcoz

2,388
1
21
44

0

votes

1 answer

Skip metadata for large binary fields in fastparquet

If a dataset has a column with large binary data (e.g. an image or a sound-wave data) then computing min/max statistics for that column becomes costly both in compute and storage requirements, despite being completely useless (querying these values…

python dask parquet fastparquet

asked Jul 16 '19 at 20:06

stav

1,497
2
15
40

0

votes

1 answer

How to pass data generated by a Databricks notebook to a Python step?

I am building an Azure Data Factory v2, which comprises A Databricks step to query large tables from Azure Blob storage and generate a tabular result intermediate_table; A Python step (which does several things and would be cumbersome to put in a…

python pyspark azure-data-factory azure-databricks fastparquet

asked Jul 15 '19 at 13:20

Davide Fiocco

5,350
5
35
72

0

votes

0 answers

Cannot import fastparquet into Python notebook

I am trying to install fastparquet in order to convert a pandas dataframe into a parquet file. But even though I get the following when i run pip install fastparquet Requirement already satisfied: fastparquet in…

python python-import parquet fastparquet

asked Apr 30 '19 at 13:07

Avantika Banerjee

307
1
17

0

votes

2 answers

How can Athena read parquet file from S3 bucket

I am porting a python project (s3 + Athena) from using csv to parquet. I can make the parquet file, which can be viewed by Parquet View. I can upload the file to s3 bucket. I can create the Athena table pointing to the s3 bucket. However, when I…

python amazon-s3 parquet amazon-athena fastparquet

asked Sep 06 '18 at 03:45

kzfid

688
3
10
17

0

votes

1 answer

Unable to read parquet file, giving Gzip code failed error

I am trying to convert parquet to csv file with pyarrow. df = pd.read_parquet('test.parquet') The above code works fine with the sample parquet files downloaded from github. But when I try with the actual large parquet file, it is giving the…

python-3.x pyspark parquet pyarrow fastparquet

asked Aug 13 '18 at 18:20

Pri31

447
1
5
9

0

votes

1 answer

Is it a bug in fastparquet module

I am using AWS sagemaker Jupiter notebook and getting following error: in () 1 import s3fs ----> 2 import fastparquet as fp 3 s3 = s3fs.S3FileSystem() 4 fs = s3fs.core.S3FileSystem() 5…

python-3.x amazon-web-services jupyter-notebook amazon-sagemaker fastparquet

asked Jul 20 '18 at 00:05

Pol99

111
8

Questions tagged [fastparquet]

Resources:

Dask dataframe read parquet format fails from http

Is it possible to store a parquet file on disk, while appending, and also retrieving rows by index?

dask: read parquet from Azure blob - AzureHttpError

Dask not recovering partitions from simple (non-Hive) Parquet files

Loading parquet file to Redshift

Google bigquery - Error message 'DataFrame' object has no attribute 'to_parquet' whereas pyarrow and fastparquet are installed

How to persist kdb tables to compressed parquet?

Symbol not found: _PyClass_Type

Converting NaN floats to other types in Parquet format

Skip metadata for large binary fields in fastparquet

How to pass data generated by a Databricks notebook to a Python step?

Cannot import fastparquet into Python notebook

How can Athena read parquet file from S3 bucket

Unable to read parquet file, giving Gzip code failed error

Is it a bug in fastparquet module