I'm trying to load the data in S3 into Aurora MySQL instance. I did it using PySpark and the performance of that is at 4 GB per hour.
current_df.write.format('jdbc').options(
url=url,
driver=jdbc_driver,
dbtable=table_name,
user=username,
password=password).mode("overwrite").save()
Added few performance improvements and observed the performance got improved(7 GB per hour), however it's still not that great.
Parameters Added to the JDBC URL
useServerPrepStmts=false&rewriteBatchedStatements=true
I tried another approach
LOAD DATA FROM S3 's3://${s3.bucket}/${filename}' INTO TABLE ${TableName} FIELDS TERMINATED BY ',';
This way it's loading 5 GB per hour into MySQL.
I have close to 2 TB of data needs to load into MySQL instance. Is there any possible way to load the data faster.