0

Currently Brenda’s gets converted to Brenda?s but I want it to be converted to Brenda's. I see that the normalizer class is able to remove accents from letters but I do not need that, I want to convert /u2019 into an apostrophe. I also want this to work for other punctuation in the future.

Brandon Hu
  • 93
  • 6
  • See https://stackoverflow.com/questions/1490218/utf-16-to-ascii-conversion-in-java – Frederic Perron Jul 14 '21 at 17:10
  • 1
    Do you use eclipse and just look at the console output? – dan1st Jul 14 '21 at 17:20
  • @FrédéricPerron Tried that solution just now but it does not convert ’ properly either. It converts into a question mark with a box around it now instead of just a question mark. – Brandon Hu Jul 14 '21 at 17:21
  • Convert where and in what circumstances? – g00se Jul 14 '21 at 17:24
  • @g00se I am trying to implement a function to convert unicode to ascii because right now my website is displaying ? because it does not recognize the unicode. I have tried the above stackoverflow solution which was to use a byte array but it did not resolve the issue. – Brandon Hu Jul 14 '21 at 17:46
  • 1
    @BrandonHu Well, I think you're trying to solve the wrong problem here. Instead of converting certain characters to another ones (which will be very hard, as rzwitserloot already pointed out), you are better off setting up your character encodings properly. If you use UTF-8 everywhere, you won't have the problem you described at all. – MC Emperor Jul 15 '21 at 08:46

1 Answers1

1

There is nothing baked into java that does this, and the general principle is incredibly complicated; for example, 'asciification' of ü to ASCII depends on the language, and even if you have some idea of a locale, that's completely useless:

Imagine a norwegian named Sjögren moves to germany and signs up on a website someplace there that is in german and highly germany-focused, and you're building that website and want to assciify that. You'd go: Allright, that turns into Sjoegren. Except that would be wrong.

Effectively then, what you want is generally speaking impossible. Still, there's a 'best effort' idea where you turn e.g. all ö into oe and all /u2019 into ' but there is as far as I know no standard conversion table available, and the fact that they asciify ö in norway as o but in germany as oe strongly suggests such a thing is a guesstimate at best (more a wild stab in the dark), which then suggests such a thing probably doesn't exist at all.

You can write it yourself, of course.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72