Is there a way to tell if the token is from what language?

Asked Nov 10 '22 at 14:10

Active Nov 10 '22 at 14:10

Viewed 12 times

I'm usig XLMR from hugging face.and I need to do some token filtration.is there a way to tell if the token is from a specific language?

for example tokens form id 50 - 500 are English tokens, and from 800- 1200 are Arabic.

I think I can use another model that can classify them. but I thought there might be a neat trick I didn't know about.

asked Nov 10 '22 at 14:10

Faisal Hejary

0 Answers0