we are in a bit of a pickle with our dynamodb data. We have a dynamodb table with millions of records currently in use in production. Transaction data was loaded into this dynamodb table by a nightly injestion process. Unfortunately, when the injestion process was built, some of the common data validations were not considered and because of that we are facing some JSON parsing issues when we retrieve some of these older records. We have quite lot (100,000 + maybe more as it’s hard to figure out until the data is extracted out) of records which has these issues and manually fixing them would not be feasible. So, was wondering if anyone had come across similar issues and if so, how was this addressed.
-
Just trying to understand what's going on here. You've stored a JSON string as a string value in a DynamoDB attribute? But, while it's a valid string, it's not a valid JSON string. And when your app reads an item and then tries to parse that JSON string, the parsing fails because the JSON is invalid? – jarmod Jul 23 '22 at 16:21
1 Answers
Your question is too generic and hard to answer in a general way.
You need to ask yourself the following questions:
Why did you ingest historical records into your DynamoDB? DynamoDB is not designed to be an archive for old transactions. It allows fast and scalable LOOKUP operations for specific records using unique keys. You should consider other old transaction retrieval and analytics methods, such as S3 and Athena.
Why do you care about the data validation, and what is the problem with these records? DynamoDB allows flexible schema, which gives you the option to modify the attributes of each record. For example, some users can have many open orders, others can have different address formats, etc. It also allows you to evolve your schema over time without needing to migrate all the data.

- 12,388
- 3
- 45
- 67