2

I want to get values for certain ranges of frequencies of the sound that is played by the smartphone so I can forward them via Bluetooth to a device that visualizes these ranges. Those ranges are:
0-63Hz
63-160Hz
160-400Hz
400-1000Hz
1000-2.500Hz
2.500-6.250Hz
6.250-16.000Hz

Audio Session Id is 0 so I can use any sound played by the smartphone.

What I found is the visualizer class and I thought I could achieve that with the getFft method. Though it seems like I can only separate the frequencies into same sized parts with the capture rate? Or am I completely misunderstanding something here? I tried just using the sampling rate as capture rate so I would have a value for each frequency but it just would set the capture rate to 1024 again.
Or maybe this class just isn't what I want? I think I might completely miss the point here, so any help or explanation (or recommendation of another library) would be welcome.

        val visualizer = Visualizer(0)
        visualizer.scalingMode = 0

        visualizer.setDataCaptureListener(object : Visualizer.OnDataCaptureListener {
            override fun onWaveFormDataCapture(
                vis: Visualizer,
                bytes: ByteArray,
                samplingRate: Int
            ) {

            }

            override fun onFftDataCapture(
                visualizer: Visualizer?,
                fft: ByteArray?,
                samplingRate: Int
            ) {
                //if frequency <=63 do something
                //else if frequency <=160 do something ...
            }

        }, Visualizer.getMaxCaptureRate() / 2, false, true)
        visualizer.enabled = true


Riku55
  • 43
  • 5

1 Answers1

3

It is inherent to the math of how an FFT is calculated that it will produce frequency "buckets" that are evenly sized and with a count that is equal to half the sample size and go up to a frequency that is half the sample rate. (An FFT actually produces buckets equal to the sample size, but Android's Visualizer goes ahead and dumps the second half before delivering the results because they contain a reflection of the first half, and so are not useful for visualization.)

There is going to be a very limited range of permitted capture sizes and capture rates based on hardware capabilities and plain old physics. Also, these two properties are inversely proportional. If your capture size is big, your capture rate has to be small. Audio is produced as a stream of evenly timed amplitudes (where the spacing is the samplingRate). Suppose for simplicity the audio stream is at 1024 Hz only, producing 1024 amplitudes per second. If your capture rate is 1 per second, you are collecting all 1024 of those amplitudes each time you capture, so your capture size is 1024. If your capture rate is 2 per second, you are collecting 512 amplitudes on each capture, so your capture size is 512.

Note, I don't know for sure is if you set a capture size and it doesn't inversely match your capture rate used in setDataCaptureListener, whether it ignores the size you set or actually repeats/drops data. I always use Visualizer.getMaxCaptureRate() as the capture rate.

What you can do (and it won't be exact) is average the appropriate ranges, although I think you'll want to apply the log function to the magnitude before you average, or the results won't look great. You definitely need to apply a log function to the magnitudes at some point before visualizing them for a visualizer to make sense to the viewer.

So after selecting a capture size you can prepare ranges to use for collecting the results.

private val targetEndpoints = listOf(0f, 63f, 160f, 400f, 1000f, 2500f, 6250f, 16000f)
private val DESIRED_CAPTURE_SIZE = 1024 // A typical value, has worked well for me
private lateinit var frequencyOrdinalRanges: List<IntRange>
//...

val captureSizeRange = Visualizer.getCaptureSizeRange().let { it[0]..it[1] }
val captureSize = DESIRED_CAPTURE_SIZE.coerceIn(captureSizeRange)
visualizer.captureSize = captureSize
val samplingRate = visualizer.samplingRate
frequencyOrdinalRanges = targetEndpoints.zipWithNext { a, b ->
        val startOrdinal = 1 + (captureSize * a / samplingRate).toInt()
        // The + 1 omits the DC offset in the first range, and the overlap for remaining ranges
        val endOrdinal = (captureSize * b / samplingRate).toInt()
        startOrdinal..endOrdinal
    }

And then in your listener

override fun onFftDataCapture(
    visualizer: Visualizer,
    fft: ByteArray,
    samplingRate: Int
) {
    val output = FloatArray(frequencyOrdinalRanges.size)
    for ((frequencyOrdinalRange, i) in frequencyOrdinalRanges.withIndex) {
        var logMagnitudeSum = 0f
        for (k in ordinalRange) {
            val fftIndex = k * 2
            logMagnitudeSum += log10(hypot(fft[fftIndex].toFloat(), fft[fftIndex + 1].toFloat()))
        }
        output[i] = logMagnitudeSum / (ordinalRange.last - ordinalRange.first + 1)
    }
    // If you want magnitude to be on a 0..1 scale, you can divide it by log10(hypot(127f, 127f))
    // Do something with output
}

I did not test any of the above, so there might be errors. Just trying to communicate the strategy.

Tenfour04
  • 83,111
  • 11
  • 94
  • 154
  • I already thought about just using those 1024 buckets and get the best out of it, though my code wouldn't look nearly as good as yours :') Still thinking about a more precise solution to the problem, but it works fine and also taught me about IntRanges and some other stuff I didn't know before, so thanks for that! – Riku55 Mar 05 '20 at 11:29
  • I've been looking for an answer to the same question, and I try your code, but I'm curious what do you do when `logMagnitudeSum` equals to `-Infinity` in terms of visualization? Also, just to make sure when you're computing `frequencyOrdinalRanges` did you mean to write `startOrdinal downTo endOrdinal`? (I am using Kotlin) – ILikeTacos Jul 22 '22 at 14:54
  • @ILikeTacos It's been a while since I've worked with FFT code. I don't recall infinite values being returned. IIRC, you get unsigned bytes, so they have a finite range up to some max linear power level. – Tenfour04 Jul 24 '22 at 17:04
  • @Tenfour04 actually I forgot to update my comment. The sampling rate returned by this function is returned in milihertz, the formulas expect hertz. Dividing the sampling route by 1000 fixed my `Infinity` issues. thank you so much for replying after so long! On the other hand, if I want the values to be between 0..1 you are using 127 because of floating point 127 bias, right? – ILikeTacos Jul 25 '22 at 12:17
  • 1
    Yes, (once again only if I recall correctly), they give you the value in a byte (range -128 to 127) so 127 is the largest possible value. Although I can't remember now if the byte they give you is basically wasting all the negative value so it only has a useful range of 0-127, or if it is actually an unsigned byte. If it's an unsigned byte, I think the math above would be changed. You'd have to add 128 to them (for example `fft[fftIndex].toFloat() + 128f`, and then the max value would be `log10(hypot(255f, 255f))`. – Tenfour04 Jul 25 '22 at 13:42
  • 1
    It's also worth pointing out that one byte of linear precision comes out to some pretty bad precision after you apply the logarithm. This is the unfortunate result of Android providing very low quality audio for security reasons. Another side note: the typical formula for converting sound pressure to sound pressure level (decibels) is to use `20*log(SP / SPref)`. This is because the log function closely matches how the ear perceives relative sound pressure energy in a linear way. We're doing the same thing here, except the reference is arbitrary so the 20 and the SPref don't matter. – Tenfour04 Jul 25 '22 at 13:51
  • Thank you so much for providing support after years of writing this answer! This has been very helpful!! – ILikeTacos Jul 25 '22 at 13:59