I'm fairly new to spacy and NER. I am dealing with a problem where I want to label many examples of short-form text data. I want to map company names to a custom entity CUSTOM.
Example descriptions:
Amazon1337XS324, Amazon4357YT322, *Google, Just *Eat
I am currently labeling the training data. My doubt is whether I should label the entire word as an entity or not e.g. "Amazon1337XS324" or "Amazon", "*Google" or "Google", and "Just *Eat" or "Just Eat".
From this previous post it seems I shouldn't try to remove information that the NER model would find useful. Also, in many labeling tutorials the entire word is always labeled. However, in my use case, the "non-descriptive" subsection of the word could always change, like in the Amazon example, and could end up being noise for the model.
I think I also don't understand if I only provide the entities "Amazon" or "Google" to the spacy's NER model, and new examples come in where there are many new characters next to it in the same word (e.g. Amazon1337XS325, Amazon1337XS326) , will the NER model still be able to identify "Amazon" or "Google" as CUSTOM?