AWS Glue - multiple RDS tables in one job

Question

I am trying to use AWS Glue.
My data source is in RDS(AWS Aurora) and the destination is s3.
My RDS database has many tables and I would like to sync all of them toward s3.

In the data source settings, there is an input for table name but I can specify only one table name here. Can I specify multiple tables to be synced here? or, can I specify only one table per one AWS-Glue job?

my job flow:

score 0 · Answer 1 · answered Oct 10 '22 at 14:06

0

No, you can use multiple tables in a job. Would something like that fit to Your usecase?:

The red exclamation marks are there, because I did not specified the connection actually. The target bucket can be the same. I saw, that You can not add two JDBC connections to one bucket node, something like this:

answered Oct 10 '22 at 14:06

ExceptionNotThrownException

391
1
2
13

Thank you so much! Should I add 50 JDBC nodes when I have 50 tables? Also, can I specify a target table dynamically? – tesnirs Oct 10 '22 at 15:29
It of course all depends on what You want to achive. From my experience it is good to work with glue workflows and not putting to much tables into one job. But if You have a usecase where You have to work with 50 tables... maybe You should rethink this usecase? But as said before: it depends. According to the dynamic target table: sure, You can choose SQL as mapping and You can conditionally do what You want. Play around with this tool and get some experiance. I personally was working with the python written jobs with apache spark. The glue studio is quite a new feature. – ExceptionNotThrownException Oct 10 '22 at 15:35
I also strongly recommend to use StepFunctions for the ETL processes instead of Glue. Is faster and cheaper at the end of the day. – ExceptionNotThrownException Oct 10 '22 at 15:37

AWS Glue - multiple RDS tables in one job

1 Answers1