I'm using an ESP8266 NodeMCU 12-E development board to capture audio from a pre-amplified electret microphone, then I upload it to the web where it will be converted to a wav file. My first thought was to cast the integer values of analogRead(A0)
on the ESP8266 as String
type, then concatenate them into a longer string payload which I can publish to an MQTT broker.
My MQTT client subscribers didn't seem to be getting proper sound files, because all I heard were series of rhythmic pops.
I decided to investigate if my code on the ESP8266 board was even capturing things properly. I stripped the code down to these few lines which seem to cause problems:
#include <ESP8266WiFi.h>
const char *ssid = "____"; // Change it
const char *pass = "____"; // Change it
void setup()
{
Serial.begin(115200);
Serial.println(0); //start
WiFi.mode(WIFI_STA);
WiFi.begin(ssid, pass);
}
void loop()
{
int analog = analogRead(A0);
if (analog > 255) {
analog = 255;
}
else if (analog < 0){
analog = 0;
}
Serial.print(String(analog));
Serial.print(" ");
}
Here's how I use the code above to produce a wav file to check if the sound is what I expect:
- I start up the ESP8266 development board
- I turn on the Serial Monitor and clear all previous output
- I power up my electret microphone and speak into it
- I power down my electret microphone
- I copy the contents of the Serial Monitor (which is a series of integers) into a text file called `audio.raw`
- I copy `audio.raw` to a linux machine that has ffmpeg installed
- I issue the command `ffmpeg -f u8 -ar 11111 -ac 1 -i audio.raw -y audio.wav` on the linux machine
When I listen to the audio.raw file, I hear my voice, but the speed is maybe 5-10 times faster than normal. (I also get a lot of noise and distortion, but that might be a separate issue with the input signal quality.)
I then tried changing this one line of code Serial.print(String(analog))
to Serial.print(analog)
. Then I repeated the steps above. But this time, my voice sounds like it is about 2 times faster than normal.
Why does changing this one line from Serial.print(String(analog))
to Serial.print(analog)
make such a big difference?
Is it because the String()
function is a very expensive operation that takes up a lot of time? And when the script needs more time to process each line of code, the script then has less time to capture enough analogRead(A0)
data points? And if I run the same ffmpeg
command using all the same flags, then ffmpeg will try to meet the -ar 11111
requirement by speeding up the audio play? Which would imply that my sampling rate is dependent on execution speed of my script? Which means I have to consider variable execution speeds across other boards of the same model due to variability in manufacturing precision, environmental temperature, etc...?