I am considering Google DataFlow as an option for running a pipeline that involves steps like:
- Downloading images from the web;
- Processing images.
I like that DataFlow manages the lifetime of VMs required to complete the job, so I don't need to start or stop them myself, but all examples I came across use it for data mining kind of tasks. I wonder if it is a viable option for other batch tasks like image processing and crawling.