Highest Voted 'aws-glue-spark' Questions

2

votes

1 answer

Non-Partitioned Table Schema not updated with Glue ETL Job

We have an ETL job that uses the below code snippet to update the catalog table: sink = glueContext.getSink(connection_type='s3', path=config['glue_s3_path_bc'], enableUpdateCatalog=True,…

asked May 12 '22 at 21:34

Krunal Patel

85
1
8

2

votes

0 answers

Glue secret manager integration: secretId is not provided

I am running the glue pyspark script from my local machine using the GlueETL library. When creating a dataframe from glue catalog, dyf_user_book_reading_stat = glueContext.create_dynamic_frame.from_catalog( database="xxx-db", …

aws-glue aws-glue-data-catalog aws-glue-spark aws-glue-connection aws-glue3.0

asked Apr 26 '22 at 07:49

sheetal_158

7,391
6
27
44

2

votes

1 answer

AWS Glue - Job Monitoring: Job Execution, Active Executors and Maximum Needed Executors not showing

I have set up an ETL job in AWS Glue with the following settings: Glue v.3.0, Python v.3, Spark v.3.1 and Worker type G.1X with 10 Workers and Job metrics enabled. When I'm looking at the job metrics after the job is finished, I see in the Job…

apache-spark pyspark monitoring aws-glue aws-glue-spark

asked Mar 30 '22 at 09:23

Qwaz

199
9

2

votes

1 answer

Unsupported case of DataType: com.amazonaws.services.glue.schema.types.StringType@e7b95c9 and DynamicNode: longnode

I am trying to extract 27 DynamoDB tables from a single Database using the Visual editor in AWS Glue. I have successfully crawled the database and my workflow for the job is. Extract from Source table (DynamoDB). Apply Transform (usually 1:1 and…

amazon-web-services aws-glue aws-glue-data-catalog aws-glue-spark

asked Mar 21 '22 at 21:31

Ross Alxndr

73
2
4

2

votes

0 answers

AWS Gluescript missing a Parquet file

AWS Gluescript written in pyspark usually works great, creates Parquet files, but occasionally I am missing a Parquet file. How can I ensure / mitigate missing data? pertinent code is: FinalDF.write.partitionBy("Year",…

amazon-s3 pyspark parquet aws-glue-spark

asked Mar 21 '22 at 19:00

Judy K

31
2

2

votes

1 answer

Unable to add/import additional python library datacompy in aws glue

i am trying to import additional python library - datacompy in to the glue job which use version 2 with below step Open the AWS Glue console. Under Job parameters, added the following: For Key, added --additional-python-modules. For Value, added…

python-3.x aws-glue python-module aws-glue-spark

asked Feb 20 '22 at 11:43

cloud_hari

147
1
8

2

votes

2 answers

GlueJobRunnerSession is not authorized to perform: lakeformation:GetDataAccess on resource

I am trying to use glueContext.purge_table function in my aws glue job. Whenever the job is executed it throws the following error: An error occurred while calling o82.purgeTable. : java.lang.RuntimeException: class…

amazon-web-services aws-glue amazon-athena aws-glue-spark

asked Jan 04 '22 at 17:00

Nabeel Khan Ghauri

125
1
4
15

2

votes

1 answer

Pyspark dataframe remove duplicate in AWS Glue Script

I have a script in AWS Glue ETL Job, where it reads a S3 bucket with a lot of parquet files, do a sort by key1, key2 and a timestamp field. After that the script delete the duplicates and save a single parquet file in other S3 Bucket. Look the data…

dataframe pyspark apache-spark-sql aws-glue aws-glue-spark

asked Dec 29 '21 at 16:47

Murillo Mamud

119
1
8

2

votes

0 answers

Overwrite mode in spark causing issues

I am running an AWS Pyspark Glue Job where I am reading the S3 raw path where the data has been loaded from Redshift and I am doing some transformations on top of it. Below is my code: data = spark.read.parquet(rawPath) # complete dataset.…

amazon-web-services apache-spark pyspark aws-glue aws-glue-spark

asked Nov 09 '21 at 16:12

whatsinthename

1,828
20
59

2

votes

1 answer

Is it possible to read fixed length file in AWS Glue directly without using crawler?

Is it possible to read fixed length file in AWS Glue using DynamicFrameReader from_options without using Crawlers? I found the below solution using spark but is there a way to do this in Glue directly ? pyspark parse fixed width text file

aws-glue aws-glue-spark

asked Oct 01 '21 at 07:36

Aji C S

71
7

2

votes

2 answers

End/exit a glue job programmatically

I am using Glue bookmarking to process data. My job is scheduled every day, but can also be launch "manually". Since I use bookmarks, sometimes the Glue job can start without having new data to process, the read dataframe is then empty. In this…

python pyspark aws-glue exit aws-glue-spark

asked Sep 22 '21 at 09:24

Jérémy

1,790
1
24
40

2

votes

1 answer

AWS Glue - Convert the Json response from GET(REST API) request to DataFrame/DyanamicFramce and store it in s3 bucket

headersAPI = { 'Content-Type': 'application/json' , 'accept': 'application/json' ,'Authorization': 'Bearer…

python amazon-s3 aws-glue aws-glue-data-catalog aws-glue-spark

asked Jul 27 '21 at 20:17

Chandar

31
4

2

votes

1 answer

How to join / concatenate / merge all rows of an RDD in PySpark / AWS Glue into one single long line?

I have a protocol that needs to take in many (read millions) of records. The protocol requires all of the data is a single line feed (InfluxDB / QuestDB). Using the InfluxDB client isn't currently an option so I need to do this via a socket. I am at…

pandas apache-spark pyspark aws-glue aws-glue-spark

asked Jul 21 '21 at 10:40

the1dv

893
7
14

2

votes

0 answers

Nullpointer Exception on processing Glue job

I am facing a problem with AWS Glue. The code imports two dataframes from 100s of small parquet files, using: context.create_dynamic_frame_from_options(...) The process completes successfully and the data is cleaned with null/duplicate values…

aws-glue aws-glue-spark

asked Jun 09 '21 at 13:51

Jaco Van Niekerk

4,180
2
21
48

2

votes

1 answer

find or recover deleted AWS glue job

I have accidentally deleted an AWS Glue job but I don't remember which one. Can I check from some logs what job I deleted? and recover it?

amazon-web-services aws-glue aws-glue-spark

asked May 03 '21 at 21:15

user13067694

Questions tagged [aws-glue-spark]