I am trying to mix a few files with FFMPEG that are mka and are from a Twilio Video Conference recording. I am trying to get tracks for each participant but I am trying to keep the overall timestamp from the file.
Concrete example: i have these three files:
0PA1896e43f4ca0edf17d8dbfc0bab95a52.mka
1PA2a640f11bc13af2c29397800f058cb05.mka
2PA9fa5b32edc016f6f5b9669bb9b308d97.mka
These files are all tracks of a participant in the call but joined at different times(left the meeting and re-entered, results in a new file).
I want to mix those files in a single file while keeping the timestamp when it was recorded. FFProbe shows the start of each of this files:
0PA1896e43f4ca0edf17d8dbfc0bab95a52.mka - Duration: 00:00:17.87, start: 1.360000, bitrate: 78 kb/s
1PA2a640f11bc13af2c29397800f058cb05.mka - Duration: 00:00:22.76, start: 22.521000, bitrate: 78 kb/s
2PA9fa5b32edc016f6f5b9669bb9b308d97.mka - Duration: 00:00:20.36, start: 48.944000, bitrate: 78 kb/s
So the first 00:00:17.87 should be silenced, then append the second file from 00:00:22.76 and the third from 48.944000. This would result a single file with all those 3 recording added but with silence when there is nothing, with all the recordings added. Practically, i want a delay at the start.
Imagine i'm adding a 4th recording that starts at minute 2, between recording 3 and 4 would be a gap of silence.
Or imagine a call with 3 participants but the 3rd one would enter only from minute 5. The first 5 minutes should be silenced so I can pass the trascribe api the 3rd participant and still get the correct timestamps.
The reason i want it this way is because I want to transcribe the audio to text and want the exact timestamp when the text can be heard.