In AppleScript script editor:
set diphones to {"Dah", "Di", "Du", "Beh", "Bi", "Burr"} --etc.
set targetFolder to ((choose folder) as text)
repeat with p in diphones
say p using "Vicki" pitch 55 modulation 0 saving to (targetFolder & p & ".aif")
end repeat
Then convert the files to WAV.
There are a few other options available in the "say" command dictionary.
I don't think it is as simple as that, however. How the speech synth treats these diphones can be weird, and even different according to which voice you use. You may have to manipulate quite a few to sounds to be the way you want. For example, Vicki says "Di" like "DEE" and "Bi" like "BYE". It is really hard to get those voices to intone a short "i" (as in "big") as just the diphone. It may even be necessary to have it say "big" (for example), then edit the sound in Audacity, cutting off the end and putting a fade out at the end of the edited version, then exporting that. I just did this and it works, but yeah, you'll need to do some special case adjustments. If you have the Developer tools, there is also an app called "Repeat After Me" which allows you to "tune" spoken text, but (surprisingly) for the situation I just described, it doesn't help. (It is pretty powerful for larger chunks, though).
[edit] so, yes, the phonetic input version of the above could be like this:
set diphones to {"dAO", "dIH", "dAX", "bEH", "bIH", "brr"} --etc., changed to be phonetic based on Apple's system
set targetFolder to ((choose folder) as text)
repeat with p in diphones
say ("[[inpt PHON]]" & p & "[[inpt TEXT]]") using "Vicki" pitch 52 modulation 0 saving to (targetFolder & p & ".aif")
end repeat
[ADDENDUM]
Years ago Apple's voices would all act the same, and you could tune any voice to perfectly sing a song (I did the "Star Spangled Banner" one night). Then, for some reason, the developers not only changed the voices, but took away the consistency so that some voices behave completely differently compared to others. I wasn't happy about this.
Consider the following:
Using the default voice ("Alex"), the following utterance is (you'll be encouraged to find) even as can be:
say "[[inpt TUNE]] d {D 114; P 95.0:100} UW {D 227; P 95.0:100} 1IY {D 382; P 95.0:100} . {D 30} [[inpt TEXT]]" using "Alex"
But if you use "Cellos" or "Pipe Organ", you get that bizarre lift at the end, even if you use this TUNE mode. Don't ask me why. So how did I get this to work, at least for "Alex"? I used the aforementioned "Repeat After Me" app and simplified the "tuned" output. I think you can probably get what you want using some variation of TUNE and PHON. But you'll probably have to stay away from "Cellos" and "Pipe Organ" because they are problematic for making monotonous intonations (although they may be fine for certain diphones/triphones). And maybe you'll have to use both, which is, I know, annoying. I feel your pain.
One more variation. Notice the way the following "rate" tag forces a longer utterance:
say "[[rate - 66]] [[inpt TUNE]] d {D 114; P 95.0:100} UW {D 227; P 95.0:100} 1IY {D 382; P 95.0:100} . {D 30} [[inpt TEXT]]" using "Alex"
[ADDENDUM II]
Ah, but check this out. This evens out the "Pipe Organ"; gets rid of the end lift by forcing a pitch modulation ("pbas") before the last phoneme:
say "[[rate - 66]] [[inpt TUNE]] d {D 114; P 95.0:100} UW {D 227; P 95.0:100} [[pbas - 5]] 1IY {D 382; P 95.0:100} . {D 30} [[inpt TEXT]]" using "Pipe Organ"
They're making us work way too hard here :-)
Here's a simplified version, going back to your original but sticking that pbas in there:
say "[[inpt TUNE]] d UW [[pbas - 5]] 1IY [[inpt TEXT]]" using "Pipe Organ"