Highest Voted 'spark-redshift' Questions

0

votes

1 answer

Best way to process Redshift data on Spark (EMR) via Airflow MWAA?

We have an Airflow MWAA cluster and huge volume of Data in our Redshift data warehouse. We currently process the data directly in Redshift (w/ SQL) but given the amount of data, this puts a lot of pressure in the data warehouse and it is less and…

asked Nov 17 '22 at 11:42

val

329
2
16

0

votes

1 answer

Unload all table from redshift to s3 - cpu usage

The goal is to unload a few tables (for each customer) every few hours to s3 in parquet format Each table is around 1GB (CSV format), in parquet it is around 120MB The issue is when running 2-3 parallel unloads commands the cpu of the redshift nodes…

amazon-web-services amazon-redshift amazon-redshift-spectrum spark-redshift

asked Aug 23 '22 at 16:00

omri_saadon

10,193
7
33
58

0

votes

1 answer

Is it possible to load partitioned parquet files using Redshift COPY command?

For the sake of exemplifying, let's say I have a parquet file in s3 partitioned by column date with the following format: s3://my_bucket/path/my_table/date=* So when I load the table using spark, for example, it shows the…

amazon-redshift spark-redshift

asked May 10 '22 at 20:40

Henrique Florencio

3,440
1
18
19

0

votes

2 answers

Redshift external catalog error when copying parquet from s3

I am trying to copy Google Analytics data into redshift via parquet format. When I limit the columns to a few select fields, I am able to copy the data. But on including few specific columns I get an error: ERROR: External Catalog Error. Detail:…

amazon-web-services amazon-s3 amazon-redshift parquet spark-redshift

asked May 05 '22 at 10:27

Sandeep Singh

432
6
17

0

votes

1 answer

EMR PySpark write to Redshift: java.sql.SQLException: [Amazon](500310) Invalid operation: The session is read-only

I got an error when trying to write data to Redshift using PySpark on an EMR cluster. df.write.format("jdbc") \ .option("url", "jdbc:redshift://clustername.yyyyy.us-east-1.redshift.amazonaws.com:5439/db") \ .option("driver",…

apache-spark pyspark amazon-redshift amazon-emr spark-redshift

asked May 25 '21 at 21:41

Jose Montoya

1

0

votes

1 answer

How to optimize ETL data pipeline for fault tolerance when using Spark and Redshift?

I'm writing a big batch job using PySpark that ETLs 200 tables and loads into Amazon Redshift. These 200 tables are created from one input datasource. So the batch job is successful only when data is loaded into ALL 200 tables successfully. The…

apache-spark amazon-redshift spark-redshift

asked Apr 08 '21 at 22:23

snackbar

93
7

0

votes

0 answers

AWS, dotnet spark, and redshift are not working

Hi I am having problems to get redshift and dotnet spark working: This the configuration I use to get it working on debug mode: C:\bin\spark-2.4.1-bin-hadoop2.7\bin\spark-submit.cmd ` --jars…

.net amazon-web-services apache-spark amazon-s3 spark-redshift

asked Sep 27 '20 at 16:39

Guillermo de la Torre Cárdenas

1
1

0

votes

1 answer

I would like to know whether spark-redshift libraries are open-source/free to use or it has to be licensed via Databricks

I want to use spark-redshift libraries for writing data from AWS S3 to AWS Redshift using the following code. Before using this, I would like to know whether spark-redshift libraries are open-source/free to use or it has to be licensed via…

pyspark amazon-redshift databricks spark-redshift

asked Sep 10 '20 at 11:36

Sow

71
1
4

0

votes

0 answers

Do we can able to create a daywise Snapshot in target database(redshift) as rows using debezium

Do we can able to create a daywise Snapshot of table in target database as rows using debezium.

amazon-redshift debezium spark-redshift

asked Jun 12 '20 at 08:21

user2322440

23
1
6

0

votes

1 answer

Apache Spark 2.4.0, AWS EMR, Spark Redshift and User class threw exception: java.lang.AbstractMethodError

I use Apache Spark 2.4.0, AWS EMR and Spark Redshift and right now faced the following error during reading Redshift table in Spark DataFrame: User class threw exception: java.lang.AbstractMethodError at…

apache-spark amazon-emr spark-redshift

asked Mar 11 '19 at 13:16

alexanoid

24,051
54
210
410

-1

votes

1 answer

In Redshift SQL query for reducing years

i have data with fields as shown below id grade grade_id year Diff 101 5 7 2022 9 105 k 2 2021 2 106 4 6 2020 5 110 pk 1 2022 1 i want to insert records for same id until we reaches grade = pk , Like shown below for every record in…

amazon-redshift spark-redshift amazon-redshift-serverless

asked Sep 15 '22 at 08:30

Sri Harsha

11
4

-1

votes

1 answer

how to connect from locally installed spark to aws-redshift?

downloaded necessary libraries to connect redshift from locally installed spark cluster and launched pyspark with below command but i am getting below error message. pyspark --conf…

apache-spark pyspark amazon-redshift spark-redshift

asked May 28 '21 at 15:26

john

51
7

-2

votes

1 answer

Load data from redshift using spark ad scala in an EMR

I am trying to connect redshift using spark with scala in zeppelin from an EMR cluster, I used spark-redshift library but it doesn't work. I tried many solutions and i don't know why it gives an error val df = spark.read…

scala apache-spark amazon-redshift amazon-emr spark-redshift

asked Feb 22 '20 at 17:03

MZoual

1
2

Questions tagged [spark-redshift]