The best answer here is: Don't use DirectSound unless you are using Windows XP.
If you are playing only a single sound at a time and you don't care about any real-time mixing, you can use something as trivial as PlaySound. I'm assuming that you actually want real-time mixing and the ability to play multiple sounds that overlap.
For Windows 8, Windows 8.1, or Windows 10 you can use XAudio 2.8 or 2.9 which is built into the operating system. Otherwise, you can use XAudio 2.7 which is part of the legacy DirectX SDK.
See Learning XAudio2 for some educational resources.
See DirectX Tool Kit for Audio for a simple C++ wrapper for XAudio2.
RE: DirectSound buffers
To your original question: back in the old days of Windows 95 the 'primary buffer' was the actual audio buffer submitted to the hardware. The 'secondary buffers' are where you created your individual 'voices' for playing back more than one sound at a time. The system then mixes all the secondary buffers into the primary buffer for playback.
Since the transition to NT, however, the 'primary buffer' isn't really there anymore. There is something called a 'primary' buffer but it's mostly there for BackCompat. All the buffers are mixed into a single buffer by DirectSound, and then fed to the system for playback. On Windows Vista or later, it's being fed to Windows Core API (WASAPI) that mixes all the sounds from the system and all running applications before it actually gets to the audio hardware.
You can use WASAPI directly, but the API is quite restrictive because it doesn't do any application-level mixing or source-rate conversions. Generally you only use WASAPI directly if you are an audio engine that has already done all the required conversions and mixing and just want to play a final mix.
In any case, the reason there are two sets of pointers when dealing with Lock
is because it's a "ring buffer" aka a "circular buffer". In the olden-days of Windows 95, parts of the primary buffer would actually be being played out by the hardware at the exact same time you could be writing into the buffer ahead of where the playback was currently taking place. You had this complicated two-pointer setup to avoid overwriting data that was still being played--otherwise you got the dreaded 'popping' or 'glitching' in your sound playback. Since this never happens anymore on modern versions of Windows, it's all just there for BackCompat w.r.t. to the primary buffer. That said, the DirectSound mixer still makes use of the fact that secondary buffers are "ring buffers" so the same mechanism is used for guarding the real-time mixer reading as you write 'ahead' as well if you happen to be updating a playing buffer. If a secondary buffer is not playing, you can safely just pass nullptr
for the second pointer & size.
This old-school "ring buffer" model was complicated to work with, and was more important when system memory was quite limited. Pretty much all modern sound APIs are 'packet' based instead where each playing voice has a queue of pending buffers, and you add more data by submitting a new buffer to the queue for processing. You can get notifications as a buffer is completed to know the audio in that 'packet' has been processed.
Also, in DirectSound you had to copy the audio data into the memory provided by Lock
, but modern 'packet' based APIs avoid the extra copy by reading the source data directly out of your application memory. This does add the complication of you needing to ensure the source memory remains available until all playback has stopped (i.e. you can't free
the memory when it's still being read by the real-time mixer or your application will crash), but in return you avoid a lot of extra copying.