I have been able to successfully run the pre-trained model of TextSum (Tensorflow 1.2.1). The output consists of summaries of CNN & Dailymail articles (which are chuncked into bin format prior to testing).
I have also been able to create the aforementioned bin format test data for CNN/Dailymail articles & vocab file (per instructions here). However, I am not able to create my own test data to check how good the summary is. I have tried modifying the make_datafiles.py
code to remove had coded values. I am able to create tokenized files, but the next step seems to be failing. It'll be great if someone can help me understand what url_lists
is being used for. Per the github readme -
"For each of the url lists all_train.txt, all_val.txt and all_test.txt, the corresponding tokenized stories are read from file, lowercased and written to serialized binary files train.bin, val.bin and test.bin. These will be placed in the newly-created finished_files directory."
How is a URL such as http://web.archive.org/web/20150401100102id_/http://www.cnn.com/2015/04/01/europe/france-germanwings-plane-crash-main/ being mapped to the corresponding story in my data folder? If someone has had success with this, please do let me know how to go about this. Thanks in advance!