The Apache Beam documentation Authoring I/O Transforms - Overview states:
Reading and writing data in Beam is a parallel task, and using ParDos, GroupByKeys, etc… is usually sufficient. Rarely, you will need the more specialized Source and Sink classes for specific features.
Could someone please provide a very basic example of how to do this in Python?
For example, if I had a local folder containing 100 jpeg images, how would I:
- Use ParDos to read/open the files.
- Run some arbitrary code on the images (maybe convert them to grey-scale).
- Use ParDos to write the modified images to a different local folder.
Thanks,