I have a python3 "file-like object" whose read()
method returns a string - it comes from either an opened file or an object streamed from s3
using boto3
.
I want to sanitize the stream before passing it to csv.DictReader
, in particular because that module barfs on NUL
characters on the input.
The CSV files I'm processing may be large, so I want to do this "streaming", not reading the entire file/object into memory.
How do I wrap the input object so that I can clean up every string returned from read()
with a call like: .replace('\x00', '{NUL}')
?
I think that the io
library is where to look, but I couldn't find something that obviously did what I want - to be able to intercept and transform every call to .read()
on the underlying file-like object and pass the wrapper to csv
, without reading the whole thing at once.