how a program will take an input phrase and synthesise it with pauses according to the wav files

Question

i have tried to take the input phrase and synthesise it with the .wav files given in the folder. if the input phrase has a comma, then it will start after 300ms, if we have a period or question mark, it will start after 400ms. Started with this code, I don't have an idea how to initiate, join the phrase and play the wav files.

parser.add_argument('--play', '-a', action="store", default=False)
args = parser.parse_args()

class Synthesis(object):
    def __init__(self, wav_folder):
        self.phones = {}
        self.get_wavs(wav_folder)
    def get_wavs(self, wav_folder):
        for root, dirs, files in os.walk(wav_folder, topdown=False):
            for file in files:
                filesplit = file.split('.')
                filename = filesplit[0]
                self.phones[filename] = AS.Audio()
                self.phones[filename].load(root + '/' + file)

def punctuation(phrase):
    concate_punc = []

    for word in phrase:
        phn_ex= re.match(r'[a-z]+\|,|[.]|!|[?]', word)
        phone_exception = phn_ex.group(0)
        print phone_exception

if __name__ == "__main__":
    S = Synthesis(wav_folder=args.phones)
    output = AS.Audio(rate=48000)

So you are trying to produce canned speech, rather than actual speech synthesis? You have one .wav file per word of the language you are working with? — lenz, Dec 11 '15 at 22:55
yes, as an example, when it gets "bye, bye!" then it should convert it to the cmudict phone_sequence with the pause.. it should sound like "b aa y iy (pause) b aa y iy" — jack15, Dec 11 '15 at 23:12
Ok, that sounds more like actual speech synthesis. Do you have a grapheme-to-phoneme converter, ie. a set of rules that maps eg. "bye" to the phonemes "b aa y iy"? Before that, I don't think you have to worry about pauses and stuff... — lenz, Dec 12 '15 at 10:43

how a program will take an input phrase and synthesise it with pauses according to the wav files

0 Answers0