Questions tagged [spark-redshift]
28 questions
0
votes
1 answer
Best way to process Redshift data on Spark (EMR) via Airflow MWAA?
We have an Airflow MWAA cluster and huge volume of Data in our Redshift data warehouse. We currently process the data directly in Redshift (w/ SQL) but given the amount of data, this puts a lot of pressure in the data warehouse and it is less and…

val
- 329
- 2
- 16
0
votes
1 answer
Unload all table from redshift to s3 - cpu usage
The goal is to unload a few tables (for each customer) every few hours to s3 in parquet format
Each table is around 1GB (CSV format), in parquet it is around 120MB
The issue is when running 2-3 parallel unloads commands the cpu of the redshift nodes…

omri_saadon
- 10,193
- 7
- 33
- 58
0
votes
1 answer
Is it possible to load partitioned parquet files using Redshift COPY command?
For the sake of exemplifying, let's say I have a parquet file in s3 partitioned by column date with the following format:
s3://my_bucket/path/my_table/date=*
So when I load the table using spark, for example, it shows the…

Henrique Florencio
- 3,440
- 1
- 18
- 19
0
votes
2 answers
Redshift external catalog error when copying parquet from s3
I am trying to copy Google Analytics data into redshift via parquet format. When I limit the columns to a few select fields, I am able to copy the data. But on including few specific columns I get an error:
ERROR: External Catalog Error. Detail:…

Sandeep Singh
- 432
- 6
- 17
0
votes
1 answer
EMR PySpark write to Redshift: java.sql.SQLException: [Amazon](500310) Invalid operation: The session is read-only
I got an error when trying to write data to Redshift using PySpark on an EMR cluster.
df.write.format("jdbc") \
.option("url", "jdbc:redshift://clustername.yyyyy.us-east-1.redshift.amazonaws.com:5439/db") \
.option("driver",…
0
votes
1 answer
How to optimize ETL data pipeline for fault tolerance when using Spark and Redshift?
I'm writing a big batch job using PySpark that ETLs 200 tables and loads into Amazon Redshift.
These 200 tables are created from one input datasource. So the batch job is successful only when data is loaded into ALL 200 tables successfully. The…

snackbar
- 93
- 7
0
votes
0 answers
AWS, dotnet spark, and redshift are not working
Hi I am having problems to get redshift and dotnet spark working:
This the configuration I use to get it working on debug mode:
C:\bin\spark-2.4.1-bin-hadoop2.7\bin\spark-submit.cmd `
--jars…
0
votes
1 answer
I would like to know whether spark-redshift libraries are open-source/free to use or it has to be licensed via Databricks
I want to use spark-redshift libraries for writing data from AWS S3 to AWS Redshift using the following code.
Before using this, I would like to know whether spark-redshift libraries are open-source/free to use or it has to be licensed via…

Sow
- 71
- 1
- 4
0
votes
0 answers
Do we can able to create a daywise Snapshot in target database(redshift) as rows using debezium
Do we can able to create a daywise Snapshot of table in target database as rows using debezium.

user2322440
- 23
- 1
- 6
0
votes
1 answer
Apache Spark 2.4.0, AWS EMR, Spark Redshift and User class threw exception: java.lang.AbstractMethodError
I use Apache Spark 2.4.0, AWS EMR and Spark Redshift and right now faced the following error during reading Redshift table in Spark DataFrame:
User class threw exception: java.lang.AbstractMethodError
at…

alexanoid
- 24,051
- 54
- 210
- 410
-1
votes
1 answer
In Redshift SQL query for reducing years
i have data with fields as shown below
id
grade
grade_id
year
Diff
101
5
7
2022
9
105
k
2
2021
2
106
4
6
2020
5
110
pk
1
2022
1
i want to insert records for same id until we reaches grade = pk , Like shown below for every record in…

Sri Harsha
- 11
- 4
-1
votes
1 answer
how to connect from locally installed spark to aws-redshift?
downloaded necessary libraries to connect redshift from locally installed spark cluster and launched pyspark with below command but i am getting below error message.
pyspark --conf…

john
- 51
- 7
-2
votes
1 answer
Load data from redshift using spark ad scala in an EMR
I am trying to connect redshift using spark with scala in zeppelin from an EMR cluster, I used spark-redshift library but it doesn't work. I tried many solutions and i don't know why it gives an error
val df = spark.read…

MZoual
- 1
- 2