Android, C++: How to convert audio sample rate using oboe's resampler

Question

I am using oboe to play back sound files on Android. I have both 44.1kHz and 48kHz files which I want to be able to play back on the same audio stream, therefore I need to resample.

Decoding and playing the files works fine, but since I have two different sample rates I need to resample (44.1 to 48 is what I'm currently trying, since my audio stream is 48kHz.)

So I'm trying to do the resampling using oboe's resampler, but I fail to completely understand how to. Following the readme guide for converting a fixed number of input frames (I assume that is waht I have to do?), I tried to implement as follows. The first part of the code acquires the decoded and returns if the sample rates are equal (this part works as intended), the second part is where I try to resample if necessary:

StorageDataSource *StorageDataSource::newFromStorageAsset(AMediaExtractor &extractor,
                                                          const char *fileName,
                                                          AudioProperties targetProperties) {

    std::ifstream stream;
    stream.open(fileName, std::ifstream::in | std::ifstream::binary);
    stream.seekg(0, std::ios::end);
    long size = stream.tellg();
    stream.close();

    constexpr int kMaxCompressionRatio{12};
    const long maximumDataSizeInBytes =
            kMaxCompressionRatio * (size) * sizeof(int16_t);
    auto decodedData = new uint8_t[maximumDataSizeInBytes];

    int32_t rate = NDKExtractor::getSampleRate(extractor);
    int32_t *inputSampleRate = &rate;

    int64_t bytesDecoded = NDKExtractor::decode(extractor, decodedData, targetProperties);
    auto numSamples = bytesDecoded / sizeof(int16_t);

    auto outputBuffer = std::make_unique<float[]>(numSamples);

    // The NDK decoder can only decode to int16, we need to convert to floats
    oboe::convertPcm16ToFloat(
            reinterpret_cast<int16_t *>(decodedData),
            outputBuffer.get(),
            bytesDecoded / sizeof(int16_t));

 if (*inputSampleRate == targetProperties.sampleRate) {
        return new StorageDataSource(std::move(outputBuffer),
                                     numSamples,
                                     targetProperties);
    } else {

        // this is where I try to convert the sample rate

        float *inputBuffer;
        inputBuffer = reinterpret_cast<float *>(decodedData); // is this correct?

        float *outputBuffer2;    // multi-channel buffer to be filled, TODO improve name
        int numInputFrames;  // number of frames of input

        // TODO is this correct?
        numInputFrames = numSamples / 2;

        int numOutputFrames = 0;
        int channelCount = 2;  

        resampler::MultiChannelResampler *mResampler = resampler::MultiChannelResampler::make(
                2, // channel count
                44100, // input sampleRate
                48000, // output sampleRate
                resampler::MultiChannelResampler::Quality::Best); // conversion quality

        int inputFramesLeft = numInputFrames;

        while (inputFramesLeft > 0) {

            if (mResampler->isWriteNeeded()) {
                mResampler->writeNextFrame(inputBuffer);
                inputBuffer += channelCount;
                inputFramesLeft--;
            } else {
                mResampler->readNextFrame(outputBuffer2);
                outputBuffer2 += channelCount;
                numOutputFrames++;
            }
        }
        delete mResampler;

// return is missing!
    }

// returning the original data since above code doesn't work properly yet
 return new StorageDataSource(std::move(outputBuffer),
                                     numSamples,
                                     targetProperties);
}

The resampling crashes with a SIGSEV:

A: signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7fe69c7000
A:     x0  0000007c0e3d1e00  x1  0000007fe69c7000  x2  0000007bb77dd198  x3  0000007bf5432140
A:     x4  0000000000000021  x5  8080800000000000  x6  fefeff7b976e0667  x7  7f7f7f7fff7f7f7f
A:     x8  0000000000000660  x9  0000000000000660  x10 0000000000000000  x11 0000007bf5435840
A:     x12 0000007bb77dd118  x13 0000000000000008  x14 0000007bf54321c0  x15 0000000000000008
A:     x16 0000007bf5432200  x17 0000000000000000  x18 0000007fe69bf7ba  x19 0000007c14e14c00
A:     x20 0000000000000000  x21 0000007c14e14c00  x22 0000007fe69c0d70  x23 0000007bfc6e5dc7
A:     x24 0000000000000008  x25 0000007c9b7705f8  x26 0000007c14e14ca0  x27 0000000000000002
A:     x28 0000007fe69c0aa0  x29 0000007fe69c0420
A:     sp  0000007fe69c0400  lr  0000007bf94f61f0  pc  0000007bf9501b5c
A: backtrace:
A:     #00 pc 0000000000078b5c  /data/app/myapp-G-GmPWmPgOGfffk-qHsQxw==/lib/arm64/libnative-lib.so (resampler::PolyphaseResamplerStereo::readFrame(float*)+684)
A:     #01 pc 000000000006d1ec  /data/app/myapp-G-GmPWmPgOGfffk-qHsQxw==/lib/arm64/libnative-lib.so (resampler::MultiChannelResampler::readNextFrame(float*)+44)
A:     #02 pc 000000000006c84c  /data/app/myapp-G-GmPWmPgOGfffk-qHsQxw==/lib/arm64/libnative-lib.so (StorageDataSource::newFromStorageAsset(AMediaExtractor&, char const*, AudioProperties)+1316)
A:     #03 pc 78bbcdd7f9b20dbe  <unknown>

Here are my main problems: First, how do I properly get the number of frames my input has? How exactly do frames work with audio data? I did research this and I'm still not sure I get this? Is this a constant number? How can I calculate the number of frames. How does it correlate with samples, the sample rate and the bit rate?

Second, do I use the correct input data at all? I use my decodedData value since that is what I get back from the decoder and just reinterpret_cast it to float*.

Since I am fairly unexperienced with C++, I am not sure if what I do is correct and I may be introducing multiple errors in this bit of code.

Edit: Since I am trying to resample my decoded output, i assume this bit of information about PCM from here explains what is meant by frames here:

For encodings like PCM, a frame consists of the set of samples for all channels at a given point in time, and so the size of a frame (in bytes) is always equal to the size of a sample (in bytes) times the number of channels.

Is this correct in my case? That would mean I can deduct the number of frames from the number of samples, the length of my audio bit and the channel count?

Unfortunately, I can't answer all questions, but I want to try to help a bit with understanding of frames. "How can I calculate the number of frames." - for PCM format (*.wav) the total number of frames is "duration in seconds * frame rate". Let's say you have an audio with 44100 frame rate (44.1kHz) and duration 5 seconds. It means that the number of frames will be 44100 * 5. — Karina Lipnyagova, Sep 11 '20 at 11:28
So that means, a frame contains information for all channels of that file for a specific point in time, like quoted above? So if I have a PCM 44100Hz file, and know its size, do I also somehow know the duration? How is the relation between file size and duration? — michpohl, Sep 11 '20 at 11:58
Sadly, no, I haven't. As this was more a nice-to-have for me I skipped it eventually. — michpohl, Jan 03 '21 at 15:17

Android, C++: How to convert audio sample rate using oboe's resampler

0 Answers0