Sorry if this is a repeat but I couldn't find an answer or at least would like to know if there is a clean way to do this. I have a passage from which I need to extract certain entities.
Any alphanumeric string like: PQ1234, Z123 etc Any alphanumeric string followed by another number after a space: PQ1234 01, Z123 08 Any alphanumeric string followed by another number after a space: PQ1234 01 02, Z123 07 08. As a concrete example below, the strings in bold should be extracted:
01: Once, there was a boy named AZ009 who became bored when he watched over the village PQ123 01 sheep grazing on the B0199. To entertain himself, he sang out, “R0199 01 09! R0199 01 09! R0199 01 09 is chasing the sheep!”
Rest all I want to ignore. I attempted this using spacy's NOUN, PROPN filter along with string functions like isalpha and isdigit to further filter it but it is becoming too rule based and not able to implement it too well.
I am a newbie to NLP and so wanted to know if there is a smarter way or if through some RegEx rule, I can get it done better.
Thanks