-2

There is a body of literature concerning the categorization of sounds where the possible matches would be any sound found in the modern world (for instance: http://projects.csail.mit.edu/soundnet/). This question is different in that it's limited to searching just a handful of specific sounds, recorded and trained locally. This question is about the feasibility of coding a mobile application that would record and convert a small set of sounds (say, fewer than 10), then be able to "listen" for, and identify those sounds.

In this similar, unanswered SO question, the author gives the sound of a doorbell as an example. My example would be a bit different in that I'd like to categorize vocalizations of dogs. I might define "fido bark", "rover bark", "fido whine", "rover whine", so four buttons when the app was in training mode. Then the dogs would make their sounds, and the human user would categorize each sound. The app would then be changed to listening mode, and if a certain dog made a certain vocalization, the app would match the sound and display which dog, and which vocalization occurred.

Is it feasible to code a application, such as the one outlined above, on a typical mobile device, without external processing? If so, how?

Dale
  • 5,520
  • 4
  • 43
  • 79
  • The favor of your comments for how to improve the question, or why it should be deleted, if you find a problem with it, would be appreciated. – Dale Sep 04 '20 at 23:24
  • Not a downvoter here, but I am guessing your question can be answered by either a 'yes' or 'no'. And my answer would be 'yes' (yes, it is feasible). Since that's all you asked for ;) – DaveIdito Oct 01 '20 at 20:02
  • I've added "If so, how", which I'd hoped with the brainpower here, would be obvious. – Dale Oct 10 '20 at 19:40

2 Answers2

1

It's doable. I found an article that deployed sound-based bird classification model to iOS, using Core ML and Skafos libraries: Detecting Bird Sounds with Create ML, CoreML3, and Skafos.

So it can be done with dogs as well, assuming you've got the data and then a trained model.

OfirD
  • 9,442
  • 5
  • 47
  • 90
  • Very cool. The origin appears to be a bird audio detection challenge out of machine-listening.eecs.qmul.ac.uk – Dale Oct 10 '20 at 19:46
0

In order to perform analysis on audio using a mobile device requires the same techniques as offline analysis (typically found: spectrogram, frequency shift, CNN classifier, ensembling), but under the more resource and time constrained restrictions of a mobile device.

The process of training the model is probably best done offline, only then will the model be deployed to the mobile device. On mobile devices, there are often efficient ways (libraries) that allow image matching / comparison. By converting audio to a spectrograph, these same comparison techniques can be leveraged.

More specifically, training offline with TensorFlow and deploying to Android has been described here: Net Guru blog post: Audio Classification with Machine Learning – Implementation on Mobile Devices. That post also describes the more involved steps required to get the model deployed to iOS. Additionally, the jlibrosa is an open source library to help implement some of the steps of audio processing.

Vasanthkumar Velayudham has written several articles that would be a good place to start understanding the landscape of apps in this realm, for instance on heartbeat.fritz.ai and on medium.com

Dale
  • 5,520
  • 4
  • 43
  • 79