4

I wish to transfer data in a database like MySQL[RDS] to S3 using AWS Glue ETL. I am having difficulty trying to do this the documentation is really not good. I found this link here on stackoverflow:

Could we use AWS Glue just copy a file from one S3 folder to another S3 folder?

SO based on this link, it seems that Glue does not have an S3 bucket as a data Destination, it may have it as a data Source. SO, i hope i am wrong on this. BUT if one makes an ETL tool, one of the first basics on AWS is for it to tranfer data to and from an S3 bucket, the major form of storage on AWS.

So hope someone can help on this.

Palu
  • 668
  • 3
  • 11
  • 26

2 Answers2

2

You can add a Glue connection to your RDS instance and then use the Spark ETL script to write the data to S3.

You'll have to first crawl the database table using Glue Crawler. This will create a table in the Data Catalog which can be used in the job to transfer the data to S3. If you do not wish to perform any transformation, you may directly use the UI steps for autogenerated ETL scripts.

I have also written a blog on how to Migrate Relational Databases to Amazon S3 using AWS Glue. Let me know if it addresses your query.

https://ujjwalbhardwaj.me/post/migrate-relational-databases-to-amazon-s3-using-aws-glue

Ujjwal Bhardwaj
  • 725
  • 5
  • 11
  • Hi Ujjwal, thanks for your reply and your link, it will be useful. But it seems that what you are saying is that one cannot use Glue ETL to directly move data to an S3 bucket. Is this correct. I am looking for confirmation that, that is the case. Because i was wanting to do this without resorting to writing scripts in a programming language. – Palu Aug 12 '19 at 15:47
  • Another question i have for you, if i have instead of an RDS, i just have a Delimited file sitting in an S3 bucket, can one use Glue ETL to "grab" this data. – Palu Aug 12 '19 at 15:48
  • Hey Palu. Glue is an ETL tool provided by AWS. You will need the ETL script to move the data. The best they can do to avoid writing code is to give you some UI steps that generates the code for you. You get these steps when you select "A proposed script generated by AWS Glue" while creating a Glue job. Regarding the other query, Yes! one can use Glue ETL to "grab" this data. – Ujjwal Bhardwaj Aug 14 '19 at 05:38
0

Have you tried https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template-copyrdstos3.html?

You can use AWS Data Pipeline - it has standard templates for full as well incrementation copy to s3 from RDS.

Sandeep Fatangare
  • 2,054
  • 9
  • 14
  • Hi Sandeep, thanks for this. I was already aware of this, aws Data Pipeline. So it is just sad that it seems that they did not put this capability easily in aws Glue as they have for Data Pipeline. – Palu Aug 13 '19 at 20:14
  • So far, no one has come out, saying it with confidence that aws Glue, in its current state at this time, does not have the ability to move data to an S3 bucket Natively ( meaning without have to code in another language). I hope someone would let me know if this is the case. – Palu Aug 13 '19 at 20:17