5

I'm working on a project involved audio processing.

I'm taking a piece of audio from a file, and then would like to do some processing on it. The issue is that I get the audio data as byte array, while my processing is on double array (and later on Complex array as well...).

My question is that how can I correctly convert the byte array I receive to double array to go on?

Here's my input code:

AudioFormat format = new AudioFormat(8000, 16, 1, true, true);
AudioInputStream in = AudioSystem.getAudioInputStream(WAVfile);
AudioInputStream din = null;
AudioFormat decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 
                        8000,
                        16,
                        1,
                        2,
                        8000,
                        true);
din = AudioSystem.getAudioInputStream(decodedFormat, in);
TargetDataLine fileLine = AudioSystem.getTargetDataLine(decodedFormat);
fileLine .open(format);
fileLine .start();

int numBytesRead;
byte[] targetData = new byte[256]; // (samplingRate / 1000) * 32ms

while (true) {
    numBytesRead = din.read(targetData, 0, targetData.length);

    if (numBytesRead == -1) {
        break;
    }

    double[] convertedData;
    // Conversion code goes here...

    processAudio(convertedData);
}

So far I've looked into different answers to different question around this site and others. I've tried to use ByteBuffer and bit conversion, but both of them didn't give me results that seems right (another member in my them has done the same thing on the same file in Python so I have a reference what the results should approximately be...

What am I missing? How can I correctly convert the bytes to doubles? If I want to capture in targetData only 32ms of the file, what should be the length of targerData? What then will be the length of convertedData?

Thanks in advance.

DanielY
  • 1,141
  • 30
  • 58

2 Answers2

4

The conversion using NIO buffers shouldn’t be so hard. All you have to do, is to apply a factor to normalize from a 16 Bit range, to a [-1.0…1.0] range.

Well, it isn’t so easy, but for most practical purposes, deciding for one factor is sufficient:

AudioFormat decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 
                                            8000, 16, 1, 2, 8000, true);
try(AudioInputStream in  = AudioSystem.getAudioInputStream(WAVfile);
    AudioInputStream din = AudioSystem.getAudioInputStream(decodedFormat, in);
    ReadableByteChannel inCh = Channels.newChannel(din)) {

    ByteBuffer inBuf=ByteBuffer.allocate(256);
    final double factor=2.0/(1<<16);
    while(inCh.read(inBuf) != -1) {
        inBuf.flip();
        double[] convertedData=new double[inBuf.remaining()/2];
        DoubleBuffer outBuf=DoubleBuffer.wrap(convertedData);
        while(inBuf.remaining()>=2) {
            outBuf.put(inBuf.getShort()*factor);
        }
        assert !outBuf.hasRemaining();
        inBuf.compact();
        processAudio(convertedData);
    }
}

The solution above effectively uses the …/(double)0x8000 variant. Since I don’t know what processAudio does with the supplied buffer, e.g. whether it keeps a reference to it, the loop allocates a new buffer in each iteration, but it should be easy to change it to a reusable buffer. You only have to take care about the actual number of read/converted doubles, when using a pre-allocated buffer.

Krzysztof Cichocki
  • 6,294
  • 1
  • 16
  • 32
Holger
  • 285,553
  • 42
  • 434
  • 765
0

First, read about the format you are using for samples AudioFormat.Encoding.PCM_SIGNED and BigEndian then about java int (the format of this number). Then move the bytes properly with the binary shift operators >> and << (shift one of the bytes by 8 bits to left - so that it will represent the upper byte of the integer - the one that needs to be shifted depends if this is Little or Big Endian, Big Endian means the byte containing the more important part is at the end of byte array array - so you should shift the second byte from the array 8 bits to the left), then sum the result with + or | operator into one int variable, then you need to divide the int to have the range you want in your double. Assuming you want range -1...+1 then you should divide your integer by double equal to 32768.

I would post the code here, but I don't have it with me right now. This is the instruction I've followed.

for instance I've successfully get the stereo audio data using:

AudioFormat format = new AudioFormat(8000, 16, 2, true, false);

and then converting them by:

   int l = (short) ((readedData[i*4+1]<<8)|readedData[i*4+0]);
   int r = (short) ((readedData[i*4+3]<<8)|readedData[i*4+2]);

so your scaled should be:

   double scaledL = l/32768d;
   double scaledR = r/32768d;
Krzysztof Cichocki
  • 6,294
  • 1
  • 16
  • 32
  • Based on your information and previous answers I've seen, if I iterate through byte array "data", I fill in the output array "realData" like this: realData[i] = (((data[2*i] & 0xFF) << 8) | (data[2*i + 1] & 0xFF)) / 32768.0; Am I correct? – DanielY Jun 07 '16 at 08:04
  • more like: realData[i] = (((data[2*i+1]) << 8) | (data[2*i] & 0xFF)) / 32768.0; – Krzysztof Cichocki Jun 07 '16 at 08:05
  • Ok. Though my results are still not between -1 and 1, they are > 100 – DanielY Jun 07 '16 at 08:09
  • same results. What should be the right length of my input byte array, if I want to have 32ms of data at the time, and the format is as I mentioned in the question? – DanielY Jun 07 '16 at 08:18
  • 8000(samplerate)/1000(milliseconds in second) * 32( milliseconds you need) * 2 (bytes per sample - 16 bit is 2 bytes) – Krzysztof Cichocki Jun 07 '16 at 09:04
  • 512...my thought exactly. But it means that the double array will be half the size... – DanielY Jun 07 '16 at 10:16
  • @Holger I didn't say it as a bad thing, just as a fact :). Basically the issue now (and then, too) is that Java badly converts the x-length array of bytes he reads from the WAV file to x/2-length array of doubles – DanielY Jun 08 '16 at 04:09