I have the following sentence:
'25) Figure 9:“lines are results of two-step adsorption model” -> What method/software was used for the curve fitting?'
I would like to separate the colon from the rest of the words.
By default, here is what Spacy returns:
print([w.text for w in nlp('25) Figure 9:“lines are results of two-step adsorption model” -> What method/software was used for the curve fitting?')])
['25', ')', 'Figure', '9:“lines', 'are', 'results', 'of', 'two', '-', 'step', 'adsorption', 'model', '”', '-', '>', 'What', 'method', '/', 'software', 'was', 'used', 'for', 'the', 'curve', 'fitting', '?']
What I would like to get is:
['25', ')', 'Figure', '9', ':', '“', lines', 'are', 'results', 'of', 'two', '-', 'step', 'adsorption', 'model', '”', '-', '>', 'What', 'method', '/', 'software', 'was', 'used', 'for', 'the', 'curve', 'fitting', '?']
I was trying to do:
# Add special case rule
special_case = [{ORTH: ":"}]
nlp.tokenizer.add_special_case(":", special_case)
But no results, the print shows the same value.