Using simple thresholds on energy levels would probably not be robust enough for your use case.
A good way to go about this would be to first extract some properties from the sound stream that are specific to the sound of blowing out candles. Then use a machine learning algorithm to train a model based on training examples (a set of recordings of the sound you want to recognize), which can then be used to classify snippets of sound coming into your microphone in real-time when using the application.
Given the possible environmental sounds going on while you blow out candles (birthdays are always noisy, aren't they?), it may be difficult to train a model that is robust enough to these background sounds. This is not a simple problem if you care about accuracy.
It may be doable though:
Forgive me the self-promotion, but my company developed an SDK that provides an answer to the question you are asking: "Is there a way to "teach" an app a sound such that the app can subsequently recognize it?"
I am not sure if the specific sound of blowing out candles would work, as the SDK was primarily aimed at applications involving somewhat percussive sounds, but it might still work for your case. Here is a link, where you will also find a demo program you can download and try if you like: SampleSumo PSR SDK