I want to make a train source for my chat bot with tensorflow. My corpus file looks like this:
hello!
hello,nice to meet you!
nice to meet you too!
goodbye
bye
after I read the corpus file:
ds = tf.data.TextLineDataset("./corpus.txt")
ds = ds.map(lambda x: tf.py_func(lambda x: x.lower(), [x], tf.string, stateful=False))
ds = ds.map(lambda x: tf.constant("bos_ ") + x + tf.constant(" _eos"))
I can get dataset like this:
bos_ hello! _eos
bos_ hello ,nice to meet you! _eos
bos_ nice to meet you too! _eos
bos_ goodbye _eos
bos_ bye _eos
But how can I make a dataset like this:
('bos_ hello! _eos', 'bos_ hello, nice to meet you! _eos')
('bos_ hello, nice to meet you! _eos', 'bos_ nice to meet you too! _eos')
('bos_ nice to meet you too! _eos', 'bos_ goodbye _eos')
('bos_ goodbye _eos','bos_ bye _eos')
further, how can I make a dataset like:
('bos_ hello! _eos', 'bos_ hello, nice to meet you! _eos')
('bos_ hello! hello, nice to meet you! _eos', 'bos_ nice to meet you too! _eos')
('bos_ hello! hello,nice to meet you! nice to meet you too! _eos', 'bos_ goodbye _eos')
('bos_ hello! hello,nice to meet you! nice to meet you too! goodbye _eos', 'bos_ bye _ eos')