Questions tagged [aws-glue-workflow]
42 questions
13
votes
1 answer
How to configure Spark / Glue to avoid creation of empty $_folder_$ after Glue job successful execution
I have a simple glue etl job which is triggered by Glue workflow. It drop duplicates data from a crawler table and writes back the result into a S3 bucket. The job is completed successfully . However the empty folders that spark generates "$folder$"…

Lina
- 1,217
- 1
- 15
- 28
4
votes
0 answers
AWS Glue Workflow marked with status `Completed` even on Glue job errors
I am creating Glue Workflow using CDK as shown below. It is composed of Glue jobs and crawlers. Is it possible to mark the status of the Workflow as Error when any of the components fail? Currently it is always marked as Completed.
const etlWorkflow…

Krzysztof Słowiński
- 6,239
- 8
- 44
- 62
3
votes
0 answers
AWS Glue Crawler creates multiple tables when reading empty files
I'm writing a Glue Crawler as a part of an ETL, and I have a very annoying problem -
The S3 bucket I'm crawling contains many different JSON files, all with the same schema. When crawling the bucket, the crawler creates a new table for every empty…

Golden
- 407
- 2
- 12
3
votes
0 answers
AWS Glue- Data Lineage and Job Tracking
Is there a way to track what each job we create in AWS Glue is doing? For e.g., if jobs doing the same action are created twice, the data lineage of data while going through each transformation?

Shilpa Majumdar
- 31
- 1
2
votes
1 answer
Error in AWS Glue job "LAUNCH ERROR | File --class does not existPlease refer logs for details."
I'm getting an error after running a Glue job from workflow.
The error states
"LAUNCH ERROR | File --class does not existPlease refer logs for details."
We have tried passing job parameter as well "--class GlueApp" though our job is python.
I think…

Tanmoy Santra
- 55
- 1
- 4
2
votes
0 answers
How to monitor an AWS Glue Workflow
I have a Glue Workflow consisting of multiple AWS Glue jobs, and I want to be alerted when it fails. Currently I have CloudWatch alarms on each of the individual jobs that make up the workflow. The problems with my current solution are that it…

MikeFHay
- 8,562
- 4
- 31
- 52
2
votes
2 answers
AWS Glue - Can conditional triggers fire with conditional on jobs from another workflow?
I am using the AWS Glue service with two separate workflows (let's say workflow A and workflow B).
I have created a conditional-type trigger in workflow B that watches jobs in workflow A and supposedly fires when they succeed. Can this trigger…

LazyEval
- 769
- 1
- 8
- 22
2
votes
0 answers
AWS Glue: Column "column_name" not found in schema
I'm trying to create an ETL job in AWS Glue. The use-case is as follows: When a column gets added in one of the source table after running ETL job, and when we try to re run the etl job, the etl job fails saying column not found (in target…

Jake
- 391
- 1
- 4
- 22
2
votes
1 answer
Installing AWS Glue ETL Library
Issue
I am facing the below error after having set up the AWS Glue Library:
PS C:\Users\[user]\Documents\[company]\projects\code\data-lake\etl\tealium> python visitor.py
20/04/05 19:33:14 WARN NativeCodeLoader: Unable to load native-hadoop library…

Rafael Vasconcelos Silva
- 61
- 2
- 7
1
vote
1 answer
AWS Glue Workflow to trigger email on any ETL job failure using Amazon SES
In AWS Glue, I am executing a couple of ETL jobs using workflow, Now I want to inform business via email on the failure of any of the ETL jobs. I need help to get name of failed job and the error caused the job to fail, and pass it to job which…

B Kiran
- 11
- 2
1
vote
1 answer
AWS GLUE Pyspark job delete S3 folder unexpectly
My glue workflow is DDB -> GLUE table (by using Crawler) -> S3 (by using GLUE job)
I create S3 folder manually before the workflow run.
For DDB table with size at 500~MB it always works fine (runs 7-10min to finish), the s3 path will have correct…

DD Jin
- 355
- 1
- 3
- 15
1
vote
0 answers
How to Dynamically create ETL jobs in AWS Glue with workflow
I am trying to perform dynamic /programmatic ETL jobs in glue with basic mapping and Transformations.
Scenarios are below,
new files comes in particular S3 bucket, job should trigger and to load in another s3 location.
how to handle dynamic mapping…

gvrspk
- 11
- 2
1
vote
1 answer
Can Glue Workflow or Trigger get parameters from EventBridge
My system design
I have created 4 Glue Jobs: testgluejob1, testgluejob2, testgluejob3 and common-glue-job.
EventBridge rule detects SUCCEEDED state of glue jobs such as testgluejob1, testgluejob2, testgluejob3.
After getting Glue Job's SUCCEEDED…

O.Takashi
- 73
- 2
- 12
1
vote
2 answers
How to pass RunProperties while calling the glue workflow using boto3 and python in lambda function?
My python code in lambda function:
import json
import boto3
from botocore.exceptions import ClientError
glueClient = boto3.client('glue')
default_run_properties = {'s3_path': 's3://bucketname/abc.zip'}
response =…

Swarnitha
- 45
- 7
1
vote
1 answer
AWS Glue null values are inserted on RDS as string
I created an AWS glue job that loads data from a CSV file to a Mysql RDS database.
The data are loaded successfully but all NULL values were inserted in the MySQL table as strings, not as NULL.
so if I query my table like select * from myTable where…

adaso
- 61
- 5