I have been looking at options to load (basically empty and restore) Parquet file from S3 to DynamoDB. Parquet file itself is created via spark job that runs on EMR cluster. Here are few things to keep in mind,
- I cannot use AWS Data pipeline
- File is going to contain millions of rows (say 10 million), so would need an efficient solution. I believe boto API (even with batch write) might not be that efficient ?
Are there any other alternatives ?