What is the best way to read multiple json files in a S3 directory and then load them to a MySQL table?

Question

I have a S3 folder with 40k++ json file where each of the files has the following format:

[{"AAA": "XXXX", "BBB": "XXXX", "CCC": "XXXX"}]

My purpose is to read these json files (in one S3 folder), combine them into one structured table perhaps to perform some transformation of the data and then load them into a MySQL table. This process will probably be needed to be run on weekly basis.

Any quicker way for doing ETL on this kind of data source? Would appreciate if you have any feasible recommendation. Thanks a lot!

Tried to read each json files through boto3 with something like 'obj.get()['Body'].read()' (in python), however, the iteration over all the files took me few hours to run.

score 0 · Answer 1 · answered Jul 23 '19 at 22:30

0

You could use Amazon Athena to read the JSON files, perform ETL and output the data into a CSV (or other) format. It can read the multiple files, understand JSON and do normal SQL manipulation.

If you are running Amazon Aurora, you could Load Data into an Amazon Aurora MySQL DB Cluster from Text Files in an Amazon S3 Bucket.

If it is normal MySQL, you could Load S3 Data into Amazon RDS MySQL Table with AWS Data Pipeline.

answered Jul 23 '19 at 22:30

John Rotenstein

241,921
22
380
470

thanks for your suggestion! will sign up a free tier account to have a try. – ahboy Jul 25 '19 at 05:51

score 0 · Answer 2 · answered Apr 29 '21 at 22:57

If (1) you are interested in getting the data into a tabular format, and (2) comfortable with python, once you read the json file(s) (using boto3 or aws-data-wrangler) you could do something like:

path = "s3://" + your_bucket_str + "/" + your_path_str
df = wr.s3.read_json(path=path, dataset=True)

and then subsequently put the data in your desired locaiton (e.g. mysql) using mysql.connector or 'pymysql'. more details on how to accomplish this step here

What is the best way to read multiple json files in a S3 directory and then load them to a MySQL table?

2 Answers2