1

I want to make a train source for my chat bot with tensorflow. My corpus file looks like this:

hello!
hello,nice to meet you!
nice to meet you too!
goodbye
bye

after I read the corpus file:

ds = tf.data.TextLineDataset("./corpus.txt")
ds = ds.map(lambda x: tf.py_func(lambda x: x.lower(), [x], tf.string, stateful=False))
ds = ds.map(lambda x: tf.constant("bos_ ") + x + tf.constant(" _eos"))

I can get dataset like this:

bos_ hello! _eos
bos_ hello ,nice to meet you! _eos
bos_ nice to meet you too! _eos
bos_ goodbye _eos
bos_ bye _eos

But how can I make a dataset like this:

('bos_ hello! _eos', 'bos_ hello, nice to meet you! _eos')
('bos_ hello, nice to meet you! _eos', 'bos_ nice to meet you too! _eos')
('bos_ nice to meet you too! _eos', 'bos_ goodbye _eos') 
('bos_ goodbye _eos','bos_ bye _eos')

further, how can I make a dataset like:

('bos_ hello! _eos', 'bos_ hello, nice to meet you! _eos')
('bos_ hello! hello, nice to meet you! _eos', 'bos_  nice to meet you too! _eos')
('bos_ hello! hello,nice to meet you! nice to meet you too! _eos', 'bos_ goodbye _eos')
('bos_ hello! hello,nice to meet you! nice to meet you too! goodbye _eos', 'bos_ bye _ eos')
Bing Xie
  • 11
  • 3

0 Answers0