I think the mammal with the best characterized voice tract is the human.
So, use any of the open source speech synthesis libraries, and in a first step, just feed it syllables that you pick based on your data. (In fact, that sounds like what your brain does when you speak... just less elaborate. Pick 8 syllables, transmit 1 of them: tadah, 3 bits.)
You'd later look into the speech synthesis tool and understand how it works: there's coefficients going from a text-to-phoneme converter to a synthesizer; these coefficients describe with parts of a human voice tract would be active when pronouncing the phoneme, and at what intensity.
You're then free to just generate coefficients to your liking, and have voice synthesized that's not quite in any human language.
What you'd be doing is basically feeding a vocoder (decoder) with coefficients generated by you – so that's a different thing you can do:
- Use a vocoder. That's a compression codec for voice. There's many: every phone these days compresses voice. I'd recommend Codec2, Opus or its predecessor, speex.
- Compress some voice at a reasonable bit rate.
- Modify the resulting coefficients with steganography.
- transmit the compressed data like a phone would
- receive the data
- apply your steganographic knowledge to get the hidden data back
- (optionally) decode at the other end to get the original, only slightly distorted voice back