I have two strings. Firstly I tokenize the first string and dump it into a pickle file file.pickle. Then I want to tokenize the second string and then again dump it in the same pickle file file.pickle. I am using the below code:-
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import pickle
text1 = ['I was able to save the model to a file and load from a file.']
tokenizer_t1 = Tokenizer(num_words=10, lower=True)
tokenizer_t1.fit_on_texts(text1)
with open('file.pickle', 'wb') as handle:
pickle.dump(tokenizer_t1, handle)
handle.close()
with open('file.pickle', 'rb') as handle:
tokenizer_txt1 = pickle.load(handle)
print("word_index : ",tokenizer_txt1.word_index)
# word_index : {'to': 1, 'a': 2, 'file': 3, 'i': 4, 'was': 5, 'able': 6, 'save': 7, 'the': 8, 'model': 9, 'and': 10, 'load': 11, 'from': 12}
text2 = ['Tokenizer class has a function to save data into JSON format. The accepted answer clearly demonstrates the tokenizer.']
tokenizer_t2 = Tokenizer(num_words=10, lower=True)
tokenizer_t2.fit_on_texts(text2)
with open('file.pickle', 'ab') as handle:
pickle.dump(tokenizer_t2, handle)
handle.close()
with open('file.pickle', 'rb') as handle:
tokenizer_txt2 = pickle.load(handle)
print("word_index : ",tokenizer_txt2.word_index)
# word_index : {'to': 1, 'a': 2, 'file': 3, 'i': 4, 'was': 5, 'able': 6, 'save': 7, 'the': 8, 'model': 9, 'and': 10, 'load': 11, 'from': 12}
When I read the file.pickle, I am getting output as:-
word_index : {'to': 1, 'a': 2, 'file': 3, 'i': 4, 'was': 5, 'able': 6, 'save': 7, 'the': 8, 'model': 9, 'and': 10, 'load': 11, 'from': 12}
But my desired output be like:-
{'to': 1, 'a': 2, 'file': 3, 'i': 4, 'was': 5, 'able': 6, 'save': 7, 'the': 8, 'model': 9, 'and': 10, 'load': 11, 'from': 12, 'tokenizer': 13, 'class': 14, 'has': 15, 'function': 16, 'date': 17, 'into': 18, 'json': 19, 'format': 20, 'accepted': 21, 'answer': 22, 'clearly': 23, 'demonstrates': 24}.
It should contain only unique tokens of both strings. How can I do this in python?