Highest Voted 'parquet' Questions

1

vote

1 answer

How to read a parquet file in Azure Databricks?

I have few parquet files stored in my storage account, which I am trying to read using the below code. However it fails with error as incorrect syntax. Can someone suggest to me as whats the correct way to read parquet files using azure…

python parquet azure-databricks

asked Oct 07 '22 at 09:00

ZZZSharePoint

1,163
1
19
54

1

vote

1 answer

Databricks - Autoloader - Not Terminating?

I'm new to databricks and I have several azure blob .parquet locations I'm pulling data from and want to put through the autoloader so I can "create table ... using delta location ''" in SQL in another step. (Each parquet file is in its own…

python blob parquet azure-databricks databricks-autoloader

asked Oct 06 '22 at 19:21

user3042783

41
7

1

vote

2 answers

Dask writing into multiple parquet files by key

I have a very large dataset on disk as a csv file. I would like to load this into dask, do some cleaning, and then save the data for each value of date into a separate file/folder, as follows: . └── test └── 20211201 └── part.0.parquet …

python dask parquet data-lake

asked Oct 05 '22 at 01:40

Nezo

567
4
18

1

vote

1 answer

Is there an efficient way of changing a feather file to a parquet file?

I have a big feather file, which I want to change to parquet, so that I can work with Pyspark. Is there a more efficient way of change the file type than doing the following: df = pd.read_feather('file.feather').set_index('date') df_parquet =…

python pandas pyspark parquet feather

asked Sep 27 '22 at 14:10

TiTo

833
2
7
28

1

vote

0 answers

Tried reading a parquet file in spring boot without using spark. It works in local machine but doesn't work when deployed on AWS ECS container

Getting this error:- Can not read value at 0 in block -1 in file file: localdirectory/samplefile.parquet. I have to read a directory containing parquet file from s3 bucket. For this, I am downloading the directory from s3 in local and reading it in…

java spring-boot hadoop parquet

asked Sep 26 '22 at 14:04

mold_9580

11
2

1

vote

0 answers

Redshift COPY error: "Assert code: 1000 context: Reached unreachable code - Invalid type: 6551 query"

We are trying to copy data from s3 (parquet files) to redshift. Here are the respective details. Athena DDL: CREATE EXTERNAL tablename( `id` int, `col1` int, `col2` date, `col3` string, `col4` decimal(10,2), binarycol binary); Redshift DDL: CREATE…

amazon-web-services amazon-s3 amazon-redshift parquet amazon-redshift-serverless

asked Sep 23 '22 at 14:18

Hamza E. Khan

23
2
10

1

vote

1 answer

Dir columns coming by default while querying parquet files in apache drill 1.20 versions

In the latest version of drill, the dir columns are coming by default when giving a 'select *' on a parquet file. Is there a way we can disable them? The query: 'Select * from dfs.`C:\Sample.parquet` where EmpID <>'null'' The result for the above…

sql parquet apache-drill

asked Sep 23 '22 at 06:22

Rik

81
1
15

1

vote

1 answer

Streaming parquet files from S3 (Python)

I should begin by saying that this is not running in Spark. What I am attempting to do is stream n records from a parquet file in S3 process stream back to a different file in S3 ...but am only inquiring about the first step. Have tried various…

python-3.x amazon-s3 streaming parquet

asked Sep 16 '22 at 13:54

Damian Satterthwaite-Phillips

47
4

1

vote

0 answers

Read a parquet.snappy file from AWS S3 React native

I am working on a app with react native and we are at a point where we need to read the parquet.snappy file from s3 bucket in react native app. is there any library for that?

react-native amazon-s3 parquet snappy

asked Sep 15 '22 at 14:17

Pankti Shah

11
1

1

vote

1 answer

Continuously Updating Partitioned Parquet

I have a Spark script that pulls data from a database and writes it to S3 in parquet format. The parquet data is partitioned by date. Because of the size of the table, I'd like to run the script daily and have it just rewrite the most recent few…

scala apache-spark amazon-s3 parquet

asked Sep 14 '22 at 08:19

maxwellray

99
7

1

vote

2 answers

Databricks: reading data with .snappy.parquet extension

I have a table with .snappy.parquet extension. data= 'part-001-36b4-7ea3-4165-8742-2f32d8643d-c000.snappy.parquet' I would like to read this and I tried the following: table = spark.read.load(data, format='delta') When I try with the above…

python pyspark databricks parquet snappy

asked Sep 13 '22 at 13:53

Hiwot

568
5
18

1

vote

1 answer

Parquet file with more than one schema

I am used to parquet file with a single schema. I came across a file which, seemingly has more than one schema. I used pandas to convert it to a CSV file. The result is some things like this: table-1,table-2,table-3 0, {data for table-1} {dat for…

dataframe parquet

asked Sep 10 '22 at 13:56

lang2

11,433
18
83
133

1

vote

2 answers

Selecting deep columns in pyarrow.dataset parquet

Let's say I have a deeply nested arrow table like: pyarrow.Table arr: struct not null, b: list not null> not null> child 0, arr: struct not null, b:…

python parquet pyarrow apache-arrow

asked Sep 09 '22 at 17:31

mdurant

27,272
5
45
74

1

vote

1 answer

Can't view Staged Parquet File in S3 from Snowflake

I'm working on moving some Parquet files in S3 over to Snowflake. The Storage Integration & External Stage were created just fine, and when I run the list @mystage command I can see the file that I want to check out in S3 so I know it exists & that…

amazon-s3 snowflake-cloud-data-platform parquet

asked Sep 08 '22 at 17:19

jyablonski

711
1
7
17

1

vote

0 answers

Timestamp conversion in Kinesis firehose after record format conversion to Parquet

I have been using record format conversion in kinesis firehose for converting the files in parquet format in S3 where in the schema that I have is being stored in AWS Glue. I am struggling in an issue where I am unable to configure the timestamp…

amazon-web-services parquet aws-glue amazon-kinesis-firehose

asked Sep 06 '22 at 14:21

Ruchika Rathi

11
1

Questions tagged [parquet]