I have to validate Fixed Width files that I am reading from S3 to Glue. I have the lengths of each column and I have to write a glue job to validate these files.
How do I efficiently check the lengths of every row to filter out the records which don't have the correct total_length
?
What is the best way to read such files?
I tried reading it as CSV into one col0
in the Dynamic Frame and tried to filter out length
using FILTER
but this gives me a dictionary
bad_length_DF = dynamicFramerawtxt.filter(lambda x: len(x['col0']) != total_row_len)
How do I remove the records from my Dynamic Frame that have wrong lengths and create a an ERROR_Dynamic
frame?