I am working with Python to isolate elements from music. Training a model, I break my audio into frames, and have a label for each frame - 1 or 0. Unfortunately, due to rounding errors, my labels are always 1 or 2 frames short.
Converting my audio to frames, I get a value of (13, 3709)
s = []
for y in audio:
mfcc = librosa.feature.mfcc(y= y, sr = 16000, n_mfcc=13, n_fft=2048, hop_length = 1024)
s.append(mfcc)
Converting my text file (for the mp3 I am working with) from milliseconds to frame numbers, I get a vector value of 3708.
output = []
for block in textCorpus:
block_start = int(float(block[0]) * 16000 / 1024) # Converted to frame number
block_end = int(float(block[1]) * 16000 / 1024) # Converted to frame number
singing = block[2]
block_range = np.arange(block_start, block_end, 1) # Step size is 1 (per frame number)
# extraneous code
I have tried using Decimal
, math.floor
and also math.ceil
within my block_start
and block_stop
variables, but I can't seem to match my audio frame length.