I have words in the Hebrew language. Part of them are originally in English, and part of them are 'Hebrew English', meaning that those are words that are originally from English but are written with Hebrew words. For example: 'insulin' in Hebrew is "אינסולין" (Same phonetic sound).
I have a simple binary dataset. X: words (Written with Hebrew characters) y: label 1 if the word is originally in English and is written with Hebrew characters, else 0
I've tried using the classifier, but the input for it is full text, and my input is just words.
I don't want any MASKING to happen, I just want simple classification.
Is it possible to use BERT for this mission? Thanks