-1

I am researching some information about audio classification, more specifically: balanced vs. imbalanced audio datasets. So, assuming here I have two folders of two datasets' classes: Car sounds and Motorcycle sounds, car class folder has 1000 .wav and motorcycle folder has 1000 .wav too. Does that mean I have a balanced datasets just because the numbers are equal? What if the total size of .wav files inside car class is 500 Mb and the other one is 200 Mb? Okay, assuming both of them have same folder size, yet what if the time duration of individual audio clips of car recordings are longer than others in the motorcycle class?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
dani
  • 21
  • 7
  • 1
    Quite a general question, which in any case has nothing to do specifically with `tensorflow` or `tensorflow-datasets` - kindly do not spam irrelevant tags (removed & replaced with `machine-learning` and `imbalanced-data`). – desertnaut Apr 09 '20 at 21:32

1 Answers1

1

Balanced dataset means the same number from both classes. Often shorter data is padded to make it the same length to fit into classifiers. I don't have a background in audio so I can't say if padding is the norm, but if your network has some way of reconciling different input lengths that does not involve creating more inputs it will be balanced 1000-1000.

Milo Knell
  • 154
  • 1
  • 8
  • Thanks for great info. Just out of curiosity, is there any limits on what is the preferable duration of the audio clips, i.e., 1, 5, or less or more seconds? – dani Apr 10 '20 at 01:28
  • 1
    Like I said, I have never done any machine learning in audio. If I was forced at gunpoint to guess I would say that 1 is too little and 5 is probably on the lower end of acceptable. But I am really just guessing, I'd try to find some similar projects and see how long their clips are – Milo Knell Apr 10 '20 at 03:58