it does not: once your script starts dubbing the wav files, it's another task.
see it as a 3-step (i'm guessing, very little information is provided)
- step 1: you send the request --> time determined by "internet speed"
- step 2: files get dubbed --> server side work, internet speed doesn't count anymore
- step 3: you get the result back --> again internet speed related
you have to time them separately: run a benchmark only on the mixing part and see it for yourself
Funny practical way to see this:
Consider the dinner process: the time you spend eating your dinner doesn't depend on the time it takes for you to order or for the waiter to deliver the meal to you.
quick edit: i just realized it may depend on internet speed, if the dubbing/mixing part is streamed real time while being processed. but this doesn't seem your case.