Textsum(tensorflow): Assertion error while using vocab file generated from dataset

Question

Im having slight issue running on CNN data. The vocabulary file generated using the code above gives assertion error. Im not able to understand what is causing this issue.

This is the error i get :

Traceback (most recent call last):
File “/home/umair/sumModel/bazel-bin/textsum/seq2seq_attention.runfiles/__main__/textsum/seq2seq_attention.py”, line 213, in <module>
tf.app.run()
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py”, line 30, in run
sys.exit(main(sys.argv))
File “/home/umair/sumModel/bazel-bin/textsum/seq2seq_attention.runfiles/__main__/textsum/seq2seq_attention.py”, line 165, in main
assert vocab.CheckVocab(data.SENTENCE_START) > 0
AssertionError

the function in seq2seq_attention.py:

def main(unused_argv): vocab = data.Vocab(FLAGS.vocab_path, 10000000) Check for presence of required special tokens. assert vocab.CheckVocab(data.PAD_TOKEN) > 0 assert vocab.CheckVocab(data.UNKNOWN_TOKEN) >= 0 assert vocab.CheckVocab(data.SENTENCE_START) > 0 assert vocab.CheckVocab(data.SENTENCE_END) > 0 –

def main(unused_argv): vocab = data.Vocab(FLAGS.vocab_path, 10000000) Check for presence of required special tokens. assert vocab.CheckVocab(data.PAD_TOKEN) > 0 assert vocab.CheckVocab(data.UNKNOWN_TOKEN) >= 0 assert vocab.CheckVocab(data.SENTENCE_START) > 0 assert vocab.CheckVocab(data.SENTENCE_END) > 0 — Osama Jamil, Jan 04 '17 at 04:38
the default vocabulary file and the generated vocabulary file have same format but with more entries than the default. — Osama Jamil, Jan 04 '17 at 04:42

score 0 · Answer 1 · answered Jan 22 '17 at 21:53

What about these? You miss some of them in your vocabulary i.e. SENTENSE_START.

# Special tokens
PARAGRAPH_START = '<p>'
PARAGRAPH_END = '</p>'
SENTENCE_START = '<s>'
SENTENCE_END = '</s>'
UNKNOWN_TOKEN = '<UNK>'
PAD_TOKEN = '<PAD>'
DOCUMENT_START = '<d>'
DOCUMENT_END = '</d>'

source: https://github.com/tensorflow/models/blob/master/textsum/data.py

Textsum(tensorflow): Assertion error while using vocab file generated from dataset

1 Answers1