I want to know how to word tokenize the following sentence (string):
"I am good. I e.g. wash the dishes."
In to the following words:
["I", "am", "good", ".", "I", "e.g.", "wash", "the", "dishes"]
Now, the problem is when it comes to abbreviations like "e.g."
it is tokenized by NLTK word_tokenizer as follows ["e.g", "."]
I tried using using punkt trained with "e.g."
to sentence tokenize it first but I realised that after I word tokenize it I would get the same result.
Any thoughts on how I would achieve my goal.
Note: I am rstricted to using NLTK.