I was trying to compare two spark dataframe which contains Japanese characters and there's some characters that seem the same but actually different to the program, such as プ vs プ
If you put them in utf-8 encoder:
プ utf-8 = \xE3\x83\x97
プ utf-8 = \xE3\x83\x95\xE3\x82\x9A
Looks like フ(\xE3\x83\x95) + the little circle semi-voice sign(\xE3\x83\x95) = プ
What are these difference called, and is there any way to convert between them in Java/Scala?
Thank you.