How to insert a silence of specific duration at an arbitrary position of a MP3 file?

Question

Scenario

I have a bunch of MP3 files which some have a constant bit-rate, others have variable bitrate, some are encoded at 128 kbps, some at other bitrate, some are stereo and some are joint stereo. All are at 44,100 khz

In order to automate a task with these thousands of MP3 files, I'm trying to develop an algorithm that should insert a silence of an arbitrary duration into these MP3 files at different arbitrary positions / durations (eg. insert 500 ms of silence into one MP3 file at position 00:02:30, then insert 750 ms of silence into other MP3 file at position 00:40:02).

Research

The only info I found is about inserting silence at the start or at the end of an MP3 file. This is not what I want because I require to insert silence at an arbitrary position. Most of the times for most of the files I would require to add a silence near the middle of the MP3 file, and maybe very few times I would require to add it at the start of the MP3 file. I will not never need to add a silence at the end of the file.

Some suggests the usage of SOX or FFMPEG command-line applications to insert silence at the start or the end of a MP3 file. I don't know if these apps could serve me for my purpose, but anyways my objective is to do this with C# or VB.NET languages, not depending on any third party app, so this way I can have total control of what modifications I will be doing in the file, and programaticaly handle the resulting modified file to perform other tasks with it (because inserting a silence is just one of the things that I really need to do with these MP3 files).

But I approve depending on the usage of any external library, and I remembered NAudio for .NET, a great library for audio manipulation, and I found this interesting snippet which is not about inserting silence but concatenating files:

https://markheath.net/post/concatenating-sample-providers-in-naudio

I think with NAudio I will have a chance to develop an algorithm to insert silence at a specific duration.

Approaches

It's obvious I don't have enough knowledge to understand how can I do this task.

One of the approaches I figured out is just trying to insert / fill with zeroes at a specific position of the stream, I know how to do that but... how I'm supposed to translate a zero (a byte) to milliseconds to calculate the duration of the silence to insert in the MP3 file?. So I don't know if just inserting a sequence of zeroes will work as a silence, and in case of it works I don't know how to translate that sequence of zeros to time, and also I don't know whether this approach would be secure for all kind of MP3 file variants (CBR, VBR, ABR, mono or stereo channel, etc).

The second approach I think of is to use any audio editor software to generate a MP3 file that will consist of a silence of 1 millisecond, and just insert and concatenate that silence as many times as required in a specific position of the MP3 file stream. I think I would require to generate this 1 ms MP3 file for every possible CBR bitrate, but what happens for VBR and ABR?, I'm stuck with this idea.

Probably at the end things will be very easier than my thoughts, and sure NAudio could help me to accomplish this task or at least to accomplish a big part of it with less effort.

Question

How can I insert a silence of specific duration at a specific position / duration of a undetermined MP3 file format ( which could be CBR, VBR, ABR, single or stereo channel, joint stereo, 128 or 320 kbps, etc) using C# or VB.NET with or without the help of NAudio or other library for .NET?.

Requeriments

NOT USING THIRD PARTY COMMAND-LINE APPLICATIONS neither automating GUI apps.
The file modifications should be done without audio loss, that is without reencoding the file. In the same way as for example MP3DirectCut does, on which you can insert silence or cut & paste without reencoding.

Preferably it would be appreciated the implementation of a reusable universal function like the one below, with this prototype of parameters that I have thought to try simplify things:

 public static MemoryStream InsertSilence(
                 Stream inputFile, // pass the raw file stream data
                 TimeSpan startPosition, // eg: new TimeSpan(0, 2, 10)
                 TimeSpan silenceDuration // eg. TimeSpan.FromSeconds(10)
 ) {

     // Do the work, save the data into a new stream and return it.

 return null;
 }

Scott Stensland · Answer 1 · 2020-11-28T23:09:14.147

any manipulation of digital audio happens when the audio is in PCM format also called raw audio ... every audio codec ( mp3 etc. ) can be decoded into PCM -> do your manipulations -> then encode the PCM into any audio codec

once in PCM format identify range of your audio curve wobble to determine its zero crossing ... in PCM each audio sample ( point on the audio curve ) is typically an integer ( could be a 16 bit int, or 24 bit or 32 bit, etc. ) ... so if its an unsigned 16 bit integer its values vary from 0 to 2^16 - 1 ( 0 to 65535 ) in which case its zero crossing is the middle value of that range ... also pay attention to whether you have signed or unsigned integers ... unsigned is most popular and can only have values from zero on up whereas signed integers can store negative values ... if you have signed integers most likely your zero crossing value is zero ... in either case zero crossing is always the middle value of your integer's maximum possible range

to add silence you add a series of values to your PCM array of whatever your zero crossing value happens to be driven by knowing the sample bit depth

pay attention to notion of endianness ... a WAV file has a 44 byte header section followed by a payload in PCM format ... as you walk across the payload to parse the next audio sample if your bit depth ( as identified in the header section ) is say 16 bits then an audio sample takes two bytes and endianness will determine whether the most significant byte comes first or last in this set of bytes

easiest to use mono and I highly suggest you get your code working using only mono and not multi channel like stereo ... only add multi channel one you reach success with mono

top tip first convert your mp3 into WAV then do manip then encode back into mp3

I appreciate a lot your help and the time you took writing this, it is full of useful information but all those things about PCM/WAVE format and how to perform manual modifications in the way you explained are too advanced for my knowledges. — ElektroStudios, Nov 28 '20 at 21:08

ElektroStudios · Answer 2 · 2020-11-29T19:07:04.550

At the end I managed to do it using NAudio by reading this doc demonstrating the usage of OffsetSampleProvider class, and figuring out how to adapt its usage for my needs.

The biggest downside is that the solution that I figured out does not perform direct modifications to the MP3 file (like for example MP3DirectCut program does), I mean I need to save the modifications as WAVE format and then encode it to MP3.

I don't know whether NAudio could do those kind of direct modifications to save the modified stream directly as a MP3 file format without reencoding it, but on the other hand I think the solution I figured out could be the closest solution for my problem, at least using a third party library. Maybe things can't be perfect.

For this reason I publish this answer as a workaround, not as a definitive solution in case of someone could publish an answer using NAudio that can accomplish my previous requirement:

The file modifications should be done without audio loss, that is without reencoding the file. In the same way as for example MP3DirectCut does, on which you can insert silence or cut & paste without reencoding.

So this is what I have done, in VB.NET:

''' <summary>Inserts a silence starting at a specific position 
''' in the source <see cref="AudioFileReader"/>.</summary>
''' <param name="fileReader">The source <see cref="AudioFileReader"/>.</param>
''' <param name="startPosition">Start position where to insert the silence.</param>
''' <param name="silenceDuration">Duration of the silence.</param>
''' <returns>The resulting <see cref="ISampleProvider"/>.</returns>
<DebuggerStepThrough>
Public Shared Function InsertSilence(fileReader As AudioFileReader, 
                                    startPosition As TimeSpan, 
                                    silenceDuration As TimeSpan) As ISampleProvider

    Dim currentPosition As Long = fileReader.Position ' Save stream position

    ' Take audio from the beginning of file until {startPosition}
    Dim first As New OffsetSampleProvider(fileReader) With {
        .Take = startPosition
    }

    ' Take audio after {startPosition} until the end of file
    Dim second As New OffsetSampleProvider(fileReader) With {
        .Take = TimeSpan.Zero
    }

    fileReader.Position = currentPosition ' Restore stream position

    Return first.FollowedBy(silenceDuration, second)

End Function

Usage example:

Dim sourceFile As String = "C:\File.mp3"
Dim outputFile As String = "C:\Output.wav" ' It must be a WAVE file

Dim reader As New AudioFileReader(sourceFile)
Dim position As TimeSpan = New TimeSpan(0, 0, 0, 1, 500) ' 1.5 seconds
Dim duration As TimeSpan = TimeSpan.FromMilliseconds(1000)
Dim result As ISampleProvider = InsertSilence(reader, position, duration)

WaveFileWriter.CreateWaveFile16(outputFile, resultProvider)

UPDATE

The function with minor improvements:

Public Shared Function InsertSilence(wave As WaveStream, startPosition As TimeSpan, duration As TimeSpan) As IWaveProvider

    If (duration.TotalMilliseconds < 0) Then
        Throw New ArgumentException(message:=$"'{NameOf(duration)}' time can't be negative.", paramName:=NameOf(startPosition))
    End If

    If (startPosition.TotalMilliseconds < 0) Then
        Throw New ArgumentException(message:=$"'{NameOf(startPosition)}' time can't be negative.", paramName:=NameOf(startPosition))
    End If

    If (startPosition > wave.TotalTime) Then
        Throw New ArgumentException(message:=$"'{NameOf(startPosition)}' time can't be longer than source wave total time.", paramName:=NameOf(startPosition))
    End If

    Dim sourceProvider As ISampleProvider = wave.ToSampleProvider()
    Dim currentPosition As Long = wave.Position ' Save stream position

    ' Take audio from the beginning of file until {startPosition}
    Dim offset1 As New OffsetSampleProvider(sourceProvider) With {
        .Take = startPosition
    }

    ' Take audio after {startPosition} until the end of file
    Dim offset2 As New OffsetSampleProvider(sourceProvider) With {
        .Take = TimeSpan.Zero
    }

    wave.Position = currentPosition ' Restore stream position
    Return (offset1.FollowedBy(duration, offset2)).ToWaveProvider()

End Function