I have written an application which does video encoding. The encoding is a pipelined process: first you fetch the video, then you encode it using ffmpeg, then you split the video into multiple parts, etc.
During the course of this, a 1 GB video balloons into several GB of intermediate data. This service is written so that a different program (via RabbitMQ) can handle each piece of the pipeline. Of course, the process doesn't have to run this way, which brings me to my question.
I'm looking at storage requirements for making the app "live". With cloud providers, you pay per GB of storage and per GB of transfer. So far so good.
When I transfer this 1 GB video blob from one cloud VM instance to another, or from the VM to the common storage service, does that count against my bandwidth? (I realize this answer will change depending on the host's terms of service.)
Would it make more sense to have 1 VM perform the entire process, and then spin up multiple instances of that? As opposed to 1 VM only performing a single task in the pipeline? I ask this question in terms of optimizing for cost (lowest storage cost, lowest cost of spinning up VMs. Because the encoding will happen in batch, I am less concerned about pushing out requests quickly).
This scenario is a little bit unique in that I have huge amounts of binary data which cannot be stored efficiently in, say, a database. Which raises a similar question: for those with experience, when your DB VM sends its results back to your web app, are you charged for that intermediate transfer?
Am I even asking the right questions? Is there a guide that I should read, short of calling hosting providers and asking them about pricing myself?