Highest Voted 'parquet-dataset' Questions

3

votes

2 answers

AWS Athena - UPDATE table rows using SQL

I am newbie to AWS ecosystem. I am creating an application which queries data using AWS Athena. Data is transformed from JSON into parquet using AWS Glue and stored in S3. Now use case is to update that parquet data using SQL. can we update…

amazon-web-services amazon-athena parquet-dataset

asked Apr 01 '22 at 10:59

mds404

371
4
9

1

vote

0 answers

How to prevent delay in chart rendering from .parquet data fetched from Flask backend?

I am trying to create a simple GUI dashboard by fetching data using a back end Flask server by triggering an AJAX request when I interact with the multi check box drop down menus. Essentially, I have two drop down menus called "Select Date" and…

javascript ajax flask chart.js parquet-dataset

asked Aug 08 '23 at 14:58

Monami Bhuyan

13
2

1

vote

0 answers

Improving read performance of pyarrow

I have a partitioned dataset stored on internal S3 cloud. I am reading the dataset with pyarrow table import pyarrow.dataset as ds my_dataset = ds.dataset( ds_name, format="parquet", filesystem=s3file, partitioning="hive") fragments =…

pyarrow parquet-dataset

asked Nov 03 '22 at 19:27

Femi King

11
2

1

vote

0 answers

Pyarrow's write_to_dataset() causes "Calling the invoke API action failed with this message: Network Error" when partition_cols provided in AWS Lambda

I have an AWS Lambda Function (python 3.8) with pyarrow 9.0.0 and s3fs bundled together in a layer. The function reads multiple JSON files one by one and converts them to a parquet dataset with partitioning (year, month, day) to an S3 location. When…

python amazon-web-services aws-lambda pyarrow parquet-dataset

asked Oct 05 '22 at 12:16

Tamas L

11
1

1

vote

0 answers

ParquetDataset not taking the partitions from the filters

I have a parquet dataset stored on s3, and I would like to query specific rows from the if. I am doing it using pyarrow. My s3 dataset is partitioned using client year month day using hive partitioning (client=, year= ...). I am giving the…

amazon-s3 parquet pyarrow parquet-dataset

asked Mar 17 '22 at 08:04

Mhmd Dar

13
3

0

votes

0 answers

Dask dataframe create folders instead of files when saving processed files to parquet

I have some very large parquet files that i wanna make some processing, merging and cleaning and then and then, save those files into another folder. I am using dask dataframe since its the only way i can read those files without getting out of…

dask-dataframe parquet-dataset

asked Aug 01 '23 at 16:53

Wagner Lobo

1
3

0

votes

1 answer

partitioning a Parquet file in Data Factory

I am doing my project in datafactory and I need to save information in a recurrent way in the same parket file. Every certain period of time there is an update of the information and I would like it to be added to the parquet as a partition of the…

azure azure-data-factory parquet partitioning parquet-dataset

asked Jul 01 '23 at 05:13

Anastasia

3
1

0

votes

0 answers

Reading Parquet v2 file with Javascript

I've searched though the node package manager (NPM) and I can't seem to find a working Parquet library that also supports version 2. parquets was the only working parser I could find and I got this…

node.js npm parquet parquet-dataset

asked Jun 10 '23 at 00:23

Hackermon

78
1
7

0

votes

0 answers

Parquet - Specifying file path when using external key material

I have a use case where I have to encrypt my Parquet files. I implemented the KMSClient abstract class provided by Parquet CryptoFactory and have been able to encrypt and Decrypt the Parquet files and the DEK. While the above is working as expected,…

python parquet pyarrow parquet-dataset

asked May 18 '23 at 07:50

Alex Bloomberg

855
1
7
14

0

votes

0 answers

Join 2 large size tables (50 Gb and 1 billion records)

I have 2 super large tables which I am loading as dataframe in parquet format with one join key. Now the issues I need help in : I need to tune it, as I am getting OOM errors due to Java heap space. I have to apply left join. There will not be any…

apache-spark apache-spark-sql apache-spark-2.0 apache-spark-sql-repartition parquet-dataset

asked Nov 21 '22 at 18:02

Red Maple

1

0

votes

1 answer

Reading Parquet file from Spark

I use following method to read a Parquet file in Spark scala> val df = spark.read.parquet("hdfs:/ORDER_INFO") scala> df.show() When I show content of DataFrame it displays with encoded language like below [49 4E 53 5F 32 33] [49 4E 53 5F 32 30] In…

dataframe apache-spark apache-spark-sql parquet parquet-dataset

asked Oct 17 '22 at 10:47

Shan

31
4

0

votes

0 answers

How to import a parquet.gzip.cpgz file?

I am trying to open the following file in R: deputies.parquet.gzip.cpgz. Does anyone know how to do this? I have imported paruqet files before using the arrow, but I'm not sure how import this type.

r gzip parquet parquet-dataset

asked Sep 16 '22 at 18:37

w5698

159
7

0

votes

1 answer

Load Parquet Files from ADLS Gen2 using ADF

I would like to setup ADF pipeline in such a way that I need to load all the Parquet files hosted for 2+ years on ADLS Gen2 with a hierarchy of Year -> Month -> Day -> Hour - > Min. Over the period, we did have some file structure changes with a…

azure-pipelines azure-data-factory azure-synapse parquet-dataset

asked Jun 30 '22 at 20:10

sp_analytics

9

0

votes

0 answers

Parquet schema / data type for entire null object DataFrame columns

I'm writing some DataFrame to binary parquet format with one or more entire null object columns. If I then load the parquet dataset with use_legacy_dataset=False parquet_dataset = pq.ParquetDataset(root_path, use_legacy_dataset=False,…

python parquet pyarrow apache-arrow parquet-dataset

asked Dec 15 '21 at 20:25

mishbah

5,487
5
25
35

Questions tagged [parquet-dataset]