I'm usig XLMR from hugging face.and I need to do some token filtration.is there a way to tell if the token is from a specific language?
for example tokens form id 50 - 500 are English tokens, and from 800- 1200 are Arabic.
I think I can use another model that can classify them. but I thought there might be a neat trick I didn't know about.