Analyzing audio to create Guitar Hero levels automatically

Question

I'm trying to create a Guitar-Hero-like game (something like this) and I want to be able to analyze an audio file given by the user and create levels automatically, but I am not sure how to do that.

I thought maybe I should use BPM detection algorithm and place an arrow on a beat and a rail on some recurrent pattern, but I have no idea how to implement those.

Also, I'm using NAudio's BlockAlignReductionStream which has a Read method that copys byte[] data, but what happens when I read a 2-channels audio file? does it read 1 byte from the first channel and 1 byte from the second? (because it says 16-bit PCM) and does the same happen with 24-bit and 32-bit float?

Merlyn Morgan-Graham · Answer 1 · 2011-11-20T12:55:10.557

Beat detection (or more specifically BPM detection)

Beat detection algorithm overview for using a comb filter:

http://www.clear.rice.edu/elec301/Projects01/beat_sync/beatalgo.html

Looks like they do:

A fast Fourier transform
Hanning Window, full-wave rectification
Multiple low pass filters; one for each range of the FFT output
Differentiation and half-wave rectification
Comb filter

Lots of algorithms you'll have to implement here. Comb filters are supposedly slow, though. The wiki article didn't point me at other specific methods.

Edit: This article has information on streaming statistical methods of beat detection. That sounds like a great idea: http://www.flipcode.com/misc/BeatDetectionAlgorithms.pdf - I'm betting they run better in real time, though are less accurate.

BTW I just skimmed and pulled out keywords. I've only toyed with FFT, rectification, and attenuation filters (low-pass filter). The rest I have no clue about, but you've got links.

This will all get you the BPM of the song, but it won't generate your arrows for you.

Level generation

As for "place an arrow on a beat and a rail on some recurrent pattern", that is going to be a bit trickier to implement to get good results.

You could go with a more aggressive content extraction approach, and try to pull the notes out of the song.

You'd need to use beat detection for this part too. This may be similar to BPM detection above, but at a different range, with a band-pass filter for the instrument range. You also would swap out or remove some parts of the algorithm, and would have to sample the whole song since you're not detecting a global BPM. You'd also need some sort of pitch detection.

I think this approach will be messy and will guarantee you need to hand-scrub the results for every song. If you're okay with this, and just want to avoid the initial hand transcription work, this will probably work well.

You could also try to go with a content generation approach.

Most procedural content generation has been done in a trial-and-error manner, with people publishing or patenting algorithms that don't completely suck. Often there is no real qualitative analysis that can be done on content generation algorithms because they generate aesthetics. So you'd just have to pick ones that seem to give pleasing sample results and try it out.

Most algorithms are centered around visual content generation, including terrain, architecture, humanoids, plants etc. There is some research on audio content generation, Generative Music, etc. Your requirements don't perfectly match either of these.

I think algorithms for procedural "dance steps" (if such a thing exists - I only found animation techniques) or Generative Music would be the closest match, if driven by the rhythms you detect in the song.

If you want to go down the composition generation approach, be prepared for a lot of completely different algorithms that are usually just hinted about, but not explained in detail.

E.g.:

Thank you for the fast answer. although, I don't see any reference to low/high-pass filters in there - I understand what they are, but what are their purpose? how are they helpful? — Symbol, Nov 20 '11 at 11:01
From the first article link: "Since we are only looking for the tempo of our signal, we need to reduce it to a form where we can see sudden changes in sound. This is done by reducing the signal down to its envelope, which can be though of as the overall trend in sound amplitude, not the frequencies it carries. Essentially, we take each of our six frequency-banded signals and lowpass filter them". Looks like they do it for each range, so I changed the answer to reflect that. — Merlyn Morgan-Graham, Nov 20 '11 at 12:07
As for the way NAudio works. It should be a separate question because it is only related to this topic by your project. The answer would be useful for other people with other algorithms they're trying to implement. Plus I have no clue :) — Merlyn Morgan-Graham, Nov 20 '11 at 12:11
It's a little more clear now. I have a lot to learn since I never really learned about audio, but at least now I got some reading material. for the recurrent pattern I thought maybe a neural network and dynamic-time warping might be helpful, I didn't realize there were so many algorithms. :o Thank you, very helpful! — Symbol, Nov 20 '11 at 12:45
@Symbol: There's a *lot* of [Pattern Recognition](http://en.wikipedia.org/wiki/Pattern_recognition) algorithms too. The problem with neural networks is you can't just let it train itself. You have to give it feedback on the quality of any result. It won't create that pattern detection kernel from nothing :) If you want, a simpler approach might be to just do basic beat and beat offset detection, then get a friend to tab out songs for you, or download guitar tabs off the internet (or keyboard scores, or whatever). — Merlyn Morgan-Graham, Nov 20 '11 at 12:48
@Symbol: I've heard somewhere that [Hidden Markov Model](http://en.wikipedia.org/wiki/Hidden_Markov_model) algorithms might be a good place to start for pattern *detection*. Never used them myself and it all sounds like a bunch of wizardry :) The description makes it seems like it would match stateful patterns well, which musical compositions are. Most songs are based on two layers of simple patterns (chord progression, and musical "mode"), which sounds like it could be replicated with a state machine. This of course is drastic overkill if you just want buttons to hit :) — Merlyn Morgan-Graham, Nov 20 '11 at 13:01

Analyzing audio to create Guitar Hero levels automatically

1 Answers1