I am using oboe to play back sound files on Android. I have both 44.1kHz and 48kHz files which I want to be able to play back on the same audio stream, therefore I need to resample.
Decoding and playing the files works fine, but since I have two different sample rates I need to resample (44.1 to 48 is what I'm currently trying, since my audio stream is 48kHz.)
So I'm trying to do the resampling using oboe's resampler, but I fail to completely understand how to. Following the readme guide for converting a fixed number of input frames (I assume that is waht I have to do?), I tried to implement as follows. The first part of the code acquires the decoded and returns if the sample rates are equal (this part works as intended), the second part is where I try to resample if necessary:
StorageDataSource *StorageDataSource::newFromStorageAsset(AMediaExtractor &extractor,
const char *fileName,
AudioProperties targetProperties) {
std::ifstream stream;
stream.open(fileName, std::ifstream::in | std::ifstream::binary);
stream.seekg(0, std::ios::end);
long size = stream.tellg();
stream.close();
constexpr int kMaxCompressionRatio{12};
const long maximumDataSizeInBytes =
kMaxCompressionRatio * (size) * sizeof(int16_t);
auto decodedData = new uint8_t[maximumDataSizeInBytes];
int32_t rate = NDKExtractor::getSampleRate(extractor);
int32_t *inputSampleRate = &rate;
int64_t bytesDecoded = NDKExtractor::decode(extractor, decodedData, targetProperties);
auto numSamples = bytesDecoded / sizeof(int16_t);
auto outputBuffer = std::make_unique<float[]>(numSamples);
// The NDK decoder can only decode to int16, we need to convert to floats
oboe::convertPcm16ToFloat(
reinterpret_cast<int16_t *>(decodedData),
outputBuffer.get(),
bytesDecoded / sizeof(int16_t));
if (*inputSampleRate == targetProperties.sampleRate) {
return new StorageDataSource(std::move(outputBuffer),
numSamples,
targetProperties);
} else {
// this is where I try to convert the sample rate
float *inputBuffer;
inputBuffer = reinterpret_cast<float *>(decodedData); // is this correct?
float *outputBuffer2; // multi-channel buffer to be filled, TODO improve name
int numInputFrames; // number of frames of input
// TODO is this correct?
numInputFrames = numSamples / 2;
int numOutputFrames = 0;
int channelCount = 2;
resampler::MultiChannelResampler *mResampler = resampler::MultiChannelResampler::make(
2, // channel count
44100, // input sampleRate
48000, // output sampleRate
resampler::MultiChannelResampler::Quality::Best); // conversion quality
int inputFramesLeft = numInputFrames;
while (inputFramesLeft > 0) {
if (mResampler->isWriteNeeded()) {
mResampler->writeNextFrame(inputBuffer);
inputBuffer += channelCount;
inputFramesLeft--;
} else {
mResampler->readNextFrame(outputBuffer2);
outputBuffer2 += channelCount;
numOutputFrames++;
}
}
delete mResampler;
// return is missing!
}
// returning the original data since above code doesn't work properly yet
return new StorageDataSource(std::move(outputBuffer),
numSamples,
targetProperties);
}
The resampling crashes with a SIGSEV
:
A: signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7fe69c7000
A: x0 0000007c0e3d1e00 x1 0000007fe69c7000 x2 0000007bb77dd198 x3 0000007bf5432140
A: x4 0000000000000021 x5 8080800000000000 x6 fefeff7b976e0667 x7 7f7f7f7fff7f7f7f
A: x8 0000000000000660 x9 0000000000000660 x10 0000000000000000 x11 0000007bf5435840
A: x12 0000007bb77dd118 x13 0000000000000008 x14 0000007bf54321c0 x15 0000000000000008
A: x16 0000007bf5432200 x17 0000000000000000 x18 0000007fe69bf7ba x19 0000007c14e14c00
A: x20 0000000000000000 x21 0000007c14e14c00 x22 0000007fe69c0d70 x23 0000007bfc6e5dc7
A: x24 0000000000000008 x25 0000007c9b7705f8 x26 0000007c14e14ca0 x27 0000000000000002
A: x28 0000007fe69c0aa0 x29 0000007fe69c0420
A: sp 0000007fe69c0400 lr 0000007bf94f61f0 pc 0000007bf9501b5c
A: backtrace:
A: #00 pc 0000000000078b5c /data/app/myapp-G-GmPWmPgOGfffk-qHsQxw==/lib/arm64/libnative-lib.so (resampler::PolyphaseResamplerStereo::readFrame(float*)+684)
A: #01 pc 000000000006d1ec /data/app/myapp-G-GmPWmPgOGfffk-qHsQxw==/lib/arm64/libnative-lib.so (resampler::MultiChannelResampler::readNextFrame(float*)+44)
A: #02 pc 000000000006c84c /data/app/myapp-G-GmPWmPgOGfffk-qHsQxw==/lib/arm64/libnative-lib.so (StorageDataSource::newFromStorageAsset(AMediaExtractor&, char const*, AudioProperties)+1316)
A: #03 pc 78bbcdd7f9b20dbe <unknown>
Here are my main problems: First, how do I properly get the number of frames my input has? How exactly do frames work with audio data? I did research this and I'm still not sure I get this? Is this a constant number? How can I calculate the number of frames. How does it correlate with samples, the sample rate and the bit rate?
Second, do I use the correct input data at all? I use my decodedData
value since that is what I get back from the decoder and just reinterpret_cast
it to float*
.
Since I am fairly unexperienced with C++, I am not sure if what I do is correct and I may be introducing multiple errors in this bit of code.
Edit: Since I am trying to resample my decoded output, i assume this bit of information about PCM from here explains what is meant by frames here:
For encodings like PCM, a frame consists of the set of samples for all channels at a given point in time, and so the size of a frame (in bytes) is always equal to the size of a sample (in bytes) times the number of channels.
Is this correct in my case? That would mean I can deduct the number of frames from the number of samples, the length of my audio bit and the channel count?