Highest Voted 'aws-glue-spark' Questions

1

vote

1 answer

Threading in AWS Glue

I have a piece of code that creates several threads on a Glue job like this: threads = [] for data_chunk in data_chunks: json_data = get_bulk_upload_json(data_chunk) …

amazon-web-services aws-glue aws-glue-spark

asked Jul 01 '22 at 20:09

rodrigocf

1,951
13
39
62

1

vote

0 answers

Record larger than the Split size in AWS GLUE?

I'm Newbie in AWS Glue and Spark. I build my ETL in this. When connect my s3 with files of 200mb approximately not read this. The error is that An error was encountered: An error occurred while calling o99.toDF. : org.apache.spark.SparkException:…

apache-spark pyspark aws-glue aws-glue-data-catalog aws-glue-spark

asked May 21 '22 at 02:33

Vitualizz Uzumaki

11
2

1

vote

1 answer

Cast Issue with AWS Glue 3.0 - Pyspark

I'm using Glue 3.0 data = [("Java", "6241499.16943521594684385382059800664452")] rdd = spark.sparkContext.parallelize(data) df = rdd.toDF() df.show() df.select(f.col("_2").cast("decimal(15,2)")).show() I get the following…

pyspark aws-glue aws-glue-spark aws-glue3.0

asked Apr 20 '22 at 09:42

Smaillns

2,540
1
28
40

1

vote

2 answers

AWS glue NoClassDefFoundError on job.init()

Trying to debug AWS Glue scripts locally using Glue ETL library. I have installed aws-glue-libs and spark-3.1.1-amzn-0-bin-3.2.1-amzn-3.tgz. When I run job.init(), I get the following error trace: py4j.protocol.Py4JJavaError: An error occurred while…

etl aws-glue aws-glue-spark aws-glue3.0

asked Apr 18 '22 at 16:54

sheetal_158

7,391
6
27
44

1

vote

0 answers

Exception: "SparkContext should only be created and accessed on the driver" while trying foreach()

Being new to Spark, I need to read data from MySQL DB, and then update(or upsert) rows in another table based on what I've read. AFAIK, unfortunately, there's no way I can do update with DataFrameWriter, so I want to try querying directly to the DB…

apache-spark pyspark aws-glue-spark

asked Apr 15 '22 at 10:19

fracsinus

11
1

1

vote

0 answers

Trying to run pyspark code on docker image of aws_glue on mac

The following error I get. The code failed because of a fatal error. Some things to try: a) Make sure Spark has enough available resources for Jupyter to create a Spark context. b) Contact your Jupyter administrator to make sure the Spark magics…

amazon-web-services pyspark aws-glue jupyter-lab aws-glue-spark

asked Apr 15 '22 at 08:06

sheetal_158

7,391
6
27
44

1

vote

3 answers

How to capture data change in aws glue?

We have source data in on premise sql-server. We are using AWS glue to fetch data from sql-server and place it to the S3. Could anyone please help how can we implement change data capture in AWS Glue? Note- We don't want to use AWS DMS.

amazon-web-services aws-glue aws-glue-data-catalog aws-glue-spark

asked Apr 11 '22 at 07:54

gourav vijayvargiya

21
1
2

1

vote

1 answer

Using custom connector in AWS Glue ETL script

I am working on an AWS Glue ETL script using the dynamic frame glue abstraction and writing code in python. I created a JDBC connection resource named sap-lpr-connection in the glue data catalog and would like to use it to retrieve the connection…

python amazon-web-services aws-glue aws-glue-spark aws-glue-connection

asked Apr 05 '22 at 08:52

LazyEval

769
1
8
22

1

vote

0 answers

Read schema from Glue Schema Registry with Pyspark and validate records

I am trying to read schema from AWS schema registry and then validate data incoming from kafka topic.How can it done with gluescript?

pyspark apache-kafka aws-glue aws-glue-spark

asked Feb 21 '22 at 02:46

user3082928

71
7

1

vote

0 answers

Data import from MongoDB: duplicate columns

I'm trying to import data from mongoDB into AWS glue job and then to redshift, but when performing load from mongoDB I get this strange exception, is there a way to fix this issue? AnalysisException: Found duplicate column(s) in the data schema:…

mongodb pyspark aws-glue aws-glue-spark

asked Nov 30 '21 at 17:09

Miroslav Petrovic

69
6

1

vote

1 answer

Adding column to dataFrame

I need to add new column to DataFrame (DynamicFrame) based on json data from other column, what's the best way to do it? schema: 'id' 'name' 'customJson' -------------------------- 1 ,John, {'key':'lastName','value':'Smith'} after: 'id' 'name'…

pyspark aws-glue aws-glue-spark

asked Nov 24 '21 at 17:08

Miroslav Petrovic

69
6

1

vote

0 answers

AWS Glue ETL Job - Connection Refused error (Catalog Table as input)

I am trying to run a Glue ETL job which has a Glue Catalog table which has its data in S3, as input. I am getting the following error when running the job. The error seems to say that, it is unable to connect to the Spark instance but I am not sure…

amazon-web-services aws-glue aws-glue-data-catalog aws-glue-spark

asked Nov 09 '21 at 05:05

Van

35
7

1

vote

1 answer

AWS Glue null values are inserted on RDS as string

I created an AWS glue job that loads data from a CSV file to a Mysql RDS database. The data are loaded successfully but all NULL values were inserted in the MySQL table as strings, not as NULL. so if I query my table like select * from myTable where…

aws-glue aws-glue-data-catalog aws-glue-spark aws-glue-workflow

asked Oct 29 '21 at 21:04

adaso

61
5

1

vote

0 answers

How to prevent spark query against CSV glue catalog source from including headers?

I am attempting to build a Glue job that will execute a SQL query against an existing glue catalog, and store the results in another glue catalog (in the example below, only return the record with the highest cost for each value of sn.) When…

apache-spark pyspark aws-glue-data-catalog aws-glue-spark

asked Oct 15 '21 at 13:44

Brandon

11
1

1

vote

0 answers

How convert string to date when year have two digit in pyspark on aws glue

I have tried convert a string ddMMyy using to_date function to yyyyMMdd But the spark cast the date to 1900 year for exemple: I tried cast 150545 to 20450515 but got 19450515 #my_date = '150545' df = df.withColumn('sorce_format', lit('ddMMyy')) …

amazon-web-services apache-spark pyspark aws-glue-spark

asked Oct 11 '21 at 15:31

Eriton Silva

129
1
10

Questions tagged [aws-glue-spark]