0

I record birds cries with two microphones. The records can go up to 3 hours and it is time-consuming on audacity to listen to the whole file each day. What I want is a script that takes my original file and gives me a bunch of short audio files, each containing a bird cry. With my microphones I am able to record in mp3 or wav. But the script should take only cries that have a higher frequency than nHz. This frequency represents the background sound that is fixed and that should not be saved. I don't know which language is the best for that and I have absolutly no idea how to do that.

Thank you all, Thomas

Totog1nger
  • 45
  • 7
  • While although a very interesting project, the question is way to broad to try and tackle here. However good luck! – Chris W. Jan 28 '19 at 19:30

2 Answers2

0

This should be pretty easily doable in a variety of languages but Python is a decent place to start. I'll link you some relevant resources to get you started and then you can narrow your question if you run into problems.

To read your audio file in .wav format look at this documentation.

To take the data from your audio file and put it into a numpy array see this question and answer.

Here is the documentation for computing the Fourier transform of your data (to get the frequency content).

I would suggest taking a moving window and computing the Fourier transform of the data within that window and then saving the result to a file if there's significant content above your threshold frequency. The first link should have info on saving the audio file.

You can get some background on using the Fourier transform for this type of application from this Q&A and if it turns out that your problem is really difficult, I would suggest looking into some of the methods for speech detection.

For a more out-there suggestion, you could try frequency shifting your recording by adjusting the sample rate to make bird sounds resemble human speech and then use a black box tool like Googles VAD to pick out the bird calls. I'm not sure how well that would work though.

John
  • 1,837
  • 1
  • 8
  • 12
  • Thank you very much. Let's get into a coding night ;) – Totog1nger Jan 28 '19 at 20:03
  • Opening the file and putting it into an array was simple. But I don't know what to do after. I don't know how to get the frequency from the discrete fourier transformation and how to get the cries for the matching sound into new wav files. Can you help me a bit more because this is really complex for me and you seem to have a way better understanding of it than I do. – Totog1nger Jan 28 '19 at 22:21
  • The first value you get out of the fft is usually the DC component (0 frequency) and the subsequent values increment the frequency by the sampling rate divided by the size of the fft (how many total bins you got). – John Jan 29 '19 at 18:44
  • Nah, that was way too hard for me :/ The team will still do their analysis by listening to the sounds on audacity. I'll work on an interface for them to register their data, much easier for me ^^ – Totog1nger Feb 17 '19 at 20:09
0

The problem of cutting up a long file into sections of interest is usually referred to as (automatic) Audio Segmentation. If you are willing to have a fixed audio clips out (say 10 seconds), you can also treat it as an Audio Classification problem. The latter is very well studied problem, also applied to birds.

The DCASE2018 challenge had one taks about Bird Detection, and has lots of advanced methods. Basically all the best performing systems use a Constitutional Neural Network on log-scaled mel-spectrograms. A mel-spectrogram is 2D, so it basically becomes image classification. Many of the submissions are open source, so you can look at the code and play with them. Do note they are mostly focused on scoring well in a research competition, not to be practical tools for splitting a few files.

If you want to build your own model for this, I would recommend going with a Convolutional Neural Network pretrained on images, then pretrain on DCASE2018 data, then test it on your own data. That should give a very accurate system, though it will take a while to set up.

Jon Nordby
  • 5,494
  • 1
  • 21
  • 50