2

I'm using an ESP8266 NodeMCU 12-E development board to capture audio from a pre-amplified electret microphone, then I upload it to the web where it will be converted to a wav file. My first thought was to cast the integer values of analogRead(A0) on the ESP8266 as String type, then concatenate them into a longer string payload which I can publish to an MQTT broker.

My MQTT client subscribers didn't seem to be getting proper sound files, because all I heard were series of rhythmic pops.

I decided to investigate if my code on the ESP8266 board was even capturing things properly. I stripped the code down to these few lines which seem to cause problems:

#include <ESP8266WiFi.h>

const char *ssid =  "____";  // Change it
const char *pass =  "____";  // Change it

void setup()
{
  Serial.begin(115200);
  Serial.println(0);      //start
  WiFi.mode(WIFI_STA);
  WiFi.begin(ssid, pass);
}


void loop()
{
    int analog = analogRead(A0);

    if (analog > 255) {
      analog = 255;
    }
    else if (analog < 0){
      analog = 0;
    }

    Serial.print(String(analog));
    Serial.print(" ");

}

Here's how I use the code above to produce a wav file to check if the sound is what I expect:

- I start up the ESP8266 development board
- I turn on the Serial Monitor and clear all previous output
- I power up my electret microphone and speak into it
- I power down my electret microphone
- I copy the contents of the Serial Monitor (which is a series of integers) into a text file called `audio.raw`
- I copy `audio.raw` to a linux machine that has ffmpeg installed
- I issue the command `ffmpeg -f u8 -ar 11111 -ac 1 -i audio.raw -y audio.wav` on the linux machine

When I listen to the audio.raw file, I hear my voice, but the speed is maybe 5-10 times faster than normal. (I also get a lot of noise and distortion, but that might be a separate issue with the input signal quality.)

I then tried changing this one line of code Serial.print(String(analog)) to Serial.print(analog). Then I repeated the steps above. But this time, my voice sounds like it is about 2 times faster than normal.

Why does changing this one line from Serial.print(String(analog)) to Serial.print(analog) make such a big difference?

Is it because the String() function is a very expensive operation that takes up a lot of time? And when the script needs more time to process each line of code, the script then has less time to capture enough analogRead(A0) data points? And if I run the same ffmpeg command using all the same flags, then ffmpeg will try to meet the -ar 11111 requirement by speeding up the audio play? Which would imply that my sampling rate is dependent on execution speed of my script? Which means I have to consider variable execution speeds across other boards of the same model due to variability in manufacturing precision, environmental temperature, etc...?

John
  • 32,403
  • 80
  • 251
  • 422

2 Answers2

2

Your sampling rate is coupled to your loop implementation (as you have discovered). This will also cause jitter in your sampling rate as different code paths will take different amounts of time and interrupt service routines will also steal CPU cycles.

This jitter will be one of the causes of distortion in your output.

When I listen to the audio.raw file, I hear my voice, but the speed is maybe 5-10 times faster than normal.

The ESP8266 has a hardware UART so the code can potentially load the UART's FIFO buffer faster than it can output. This would be a source of the perceived faster sampling rate but also cause jitter or data loss when the buffer fills up. Depending on the implementation, when the buffer fills it will drop data or alternatively block (causing jitter).

Why does changing this one line from Serial.print(String(analog)) to Serial.print(analog) make such a big difference?

Is it because the String() function is a very expensive operation that takes up a lot of time? And when the script needs more time to process each line of code, the script then has less time to capture enough analogRead(A0) data points?

Yes, yes and yes.

One of the reasons for the performance difference is that String() involves allocating and managing memory on the heap to store the characters.

Serial.print(analog) uses a fixed size buffer on the stack as the code knows the maximum number of characters required to display an int.

And if I run the same ffmpeg command using all the same flags, then ffmpeg will try to meet the -ar 11111 requirement by speeding up the audio play?

Yes. ffmpeg assumes that the samples have a fixed sampling rate but this does not match the samples that are being printed out.

Which would imply that my sampling rate is dependent on execution speed of my script?

Yes!

Which means I have to consider variable execution speeds across other boards of the same model due to variability in manufacturing precision, environmental temperature, etc...?

Yes. There will be a multitude of variables that affect execution speeds.

What can you do?

Decouple the sampling of data from the code execution.

This can be done by implementing an Interrupt Service Routine. Tie the ISR to a hardware timer so it executes at a fixed sampling rate and avoiding jitter.

The ISR can write to a buffer which the code in loop() transmits over the serial connection. The ISR and serial transmission code need to manage the buffer to ensure that neither overrun the other. One means of doing this is to use alternate buffers that the ISR and transmission code use.

Ben T
  • 4,656
  • 3
  • 22
  • 22
  • So something like the Ticker.h library is what I can use to asynchronously capture and write analog data to a buffer? I just read about that library here: https://www.google.com/amp/s/circuits4you.com/2018/01/02/esp8266-timer-ticker-example/amp/ – John Feb 08 '19 at 15:28
  • Ticker.h uses the os_timer API which is still software based and only has precision down to 500 microseconds. You need to use the hardware timers instead. Try https://techtutorialsx.com/2017/10/07/esp32-arduino-timer-interrupts/ which references the example in https://github.com/espressif/arduino-esp32/blob/master/libraries/ESP32/examples/Timer/RepeatTimer/RepeatTimer.ino – Ben T Feb 08 '19 at 23:28
1

Since you use Serial.begin(115200) ESP8266 Microcontroller will transfer 115200 bits per second through serial port. Which is 115200 / 8 = 14400 bytes per second and that means since you use u8 (unsigned 8 bit) format for audio, each sample consists of a single byte. Just change the ffmpeg -ar parameter to 14400.

I don't any have microphones which i can connect to MCU for testing but it should work properly this way. The other -ac parameter is correct since it is mono channel audio.

Edit : Also don't use String() constructor while printing out to Serial.

While using Serial() constructor sound speeds up about 5 times because String converts your 1 byte value to 3 bytes, example ; byte : 255 -> String : "2", "5", "5" , you don't have to consider execution speed of Microcontroller, it will output 115200 bits per second as if you defined. You just need to consider it's output.

Finally delete the line

Serial.print(" ");

Also change

int analog = analogRead(A0);

to

byte analog = (byte)analogRead(A0);

since int consists of 4 bytes, you would not want to send extra 3 bytes to serial.

And after changing int to byte you can get rid this code block

if (analog > 255) {
  analog = 255;
}
else if (analog < 0){
  analog = 0;
}

If you connect ESP8266 to linux device through usb which has ffmpeg on it you can use

ttylog -b 115200 -d /dev/ttyUSB0 | ffmpeg -f u8 -ar 14400 -ac 1 -i - -y audio.wav

to capture audio data in realtime from ESP8266.

yildizmehmet
  • 53
  • 1
  • 8