0

I am trying to flip every hebrew set of characters inside a string. Lets say I have this string(Instead of hebrew letters, I will be using symbols):

§♀♠♪ this is my message♣♠♦►♣

(You can probably tell which character is in which language). And I want this character set - §♀♠♪ to be replaced with ♪♠♀§.

But, I want message♣♠♦►♣ to be replaced with message♣►♦♠♣, so only the english word inside this will stay unreversed.

How can I do that? (Yes, I know I cant use these symbols in a regular string but this is an example.)

Pshemo
  • 122,468
  • 25
  • 185
  • 269
NonameSL
  • 27
  • 2
  • 9
  • Is there a range in Unicode where the characters you are looking for all exist? – Jason Sperske Mar 21 '14 at 17:28
  • you can verify the ASCII codes for all the alphabets from aA to zZ and rectify them from the given string. – Aditya Peshave Mar 21 '14 at 17:30
  • First, can you explain how `♣►♦♠♣` is an English word? Also, you seem confused about [bidi](http://en.wikipedia.org/wiki/Bi-directional_text). – Elliott Frisch Mar 21 '14 at 17:31
  • @ElliottFrisch OP isn't calling it an English word, he's using that as a placeholder in his example for Hebrew characters. – JWiley Mar 21 '14 at 17:37
  • refer this link for the ASCII codes.. http://www.ascii.cl/htmlcodes.htm – Aditya Peshave Mar 21 '14 at 17:37
  • Can you post example with Hebrew text with expected output so we could actually do some test before posting answer? – Pshemo Mar 21 '14 at 17:39
  • 1
    Is the string originally written in reverse Hebrew and you are trying to un-reverse it? `\p{BidiClass:R}` –  Mar 21 '14 at 17:41
  • @JWiley I think I get what's he asking for now. Elliott (אֱלִיָּהוּ), but that will require people type the hebrew in backwards. – Elliott Frisch Mar 21 '14 at 17:41
  • Wow, I wonder how a Chinese mix is handled TtoB or BtoT. –  Mar 21 '14 at 17:52
  • @JasonSperske Actually yes, there is (The letters are אבגדהוזחטיכךלמםנןסעפצץקרשת btw): \u05d0 to \u05d9,\u05da to \u05df, \u05e0 to \u05e2,\u05e4 to \u05e9 and \u05ea. – NonameSL Mar 21 '14 at 21:17
  • @ElliottFrisch that is EXCACTLY why I want to reverse the hebrew letters - to create a method that takes the hebrew words/letter combination and FLIP them, only them. – NonameSL Mar 21 '14 at 21:18
  • @sln no, it is originally written in unreversed hebrew and I am trying to reverse it.. – NonameSL Mar 21 '14 at 21:19
  • So you could combine this answer http://stackoverflow.com/questions/3835917/how-do-i-specify-a-range-of-unicode-characters with some code to make something. – Jason Sperske Mar 21 '14 at 23:45

2 Answers2

1

This solution is based on example provided by OP (the one with ♣♠♦►♣) but wasn't tested on real data.

  • You should be able to find sequence of two or more Hebrew characters via \p{InHebrew}{2,}.

  • When you will find them you can use String#reverse method to reverse them.

  • Last thing is to use appendReplacement and appentTail from Matcher to create new string with updated matched parts.

Here is example which should do what you want

String yourString = ...;//place for your string
Pattern p = Pattern.compile("\\p{InHebrew}{2,}");
Matcher m = p.matcher(yourString);

StringBuffer sb = new StringBuffer();

while(m.find()){
    m.appendReplacement(sb, new StringBuilder(m.group()).reverse().toString());
}
m.appendTail(sb);

String reversedSpecial = sb.toString();
System.out.println(reversedSpecial);
Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • @sln Lawyers always told me to use small caps when saying important things and you want people to read them :) – Pshemo Mar 21 '14 at 17:58
  • 1
    Hey, hey .. Its supposed to flash for 3 seconds then dissapear. –  Mar 21 '14 at 18:03
  • Thank you so much, this works! But instead of spoon feeding, there is one thing I would like to know and didn't understand... Whats the {InHebrew} part and how did it get the hebrew letters, or if it didn't, then what DID get the heberw letters? – NonameSL Mar 22 '14 at 16:15
  • If you take a look at [Pattern documentation](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) you will find example with `\p{InGreek}` which will match "A character in the Greek block". Now if you take a look at [block definition](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#ubc) you will see it is `InXXX` where `XXX` part is valid argument of `Characters.UnicodeBlock.forName(...)` and taking closer look at source code of this class will let us know that `Hebrew` is defined there as unicodes from range `0590 - 05FF`. – Pshemo Mar 22 '14 at 16:34
  • So basically `\p{InHebrew}` is similar to `[\u0590-\u05FF]` character class. – Pshemo Mar 22 '14 at 16:36
0

assume there's an output buffer keeping the final string: when encountering a hebrew character, read it onto a stack, until an english character is found, and then pop out all letter(s) in the stack to the output buffer; english letters are moved to the output buffer directly.

象嘉道
  • 3,657
  • 5
  • 33
  • 49