How to replace non-ASCII characters in a sequence?

Question

Essentially, what this code does is:

Take an input.
Replace each sequence of characters whose length is greater than 2 with the number of times that character repeated and the character itself (e.g. jjjkkkkkllll = 3j5k4l). The input does not contain any numeric values.
Return the result.

The code:

private String replaceConsecutiveChars(String data) {
    char[] dataChars = data.toCharArray();

    int i = 0;
    int k = 0;
    Character charType = null;
    for(Character c : dataChars) {
        if(k == dataChars.length - 1 && i >= 2) {
            data = data.replace(repeat(String.valueOf(charType), ++i), (i + Character.toString(charType)));
            break;
        }

        if(i == 0) {
            charType = c;
            i++;
        }else if(c == charType) {
            i++;
        }else if(c != charType && i > 2) {
            data = data.replace(repeat(String.valueOf(charType), i), (i + Character.toString(charType)));

            i = 1;
            charType = c;
        }else if(c != charType && i <= 2) {
            i = 1;
            charType = c;
        }

        k++;
    }

    return data;
}

private String repeat(String s, int n) {
    return Stream.generate(() -> s).limit(n).collect(Collectors.joining(""));
}

However, my implementation only seems to work with the limited-ASCII character set, but I am trying to get it work with the Unicode character set. For example:

The input ddddddddkkkkkpppp will correctly output 8d5k4p.
The input êêêêÌÌÌÌÌÌÌØØØ will incorrectly output êêêêÌÌÌÌÌÌÌØØØ
The input "rrrrrêêêêÌÌÌÌÌkkkkØØØ" will incorrectly output 5rêêêêÌÌÌÌÌ4kØØØ

Why is this?

In addition, is there a better way I could do this than the way I'm doing it right now?

The only place you use that is for the character you are saving over the loops. Seems rather silly. Why don't you use a char and assign it the value of '1' initially as you know you will never loop over a number in your loop? — Rabbit Guy, Jul 06 '17 at 19:02

Luciano van der Veekens · Accepted Answer · 2017-07-06T19:14:31.020

You are comparing instances of Character using ==, which will not work as expected because the operator compares object references instead of values.

A simple quick fix is to change the for-loop to:

for (char c : dataChars) {
}

Notice the change of types (Character to char). This way charType is automatically unboxed to the primitive char when comparing it to c.

Another solution is to replace every c == charType with c.equals(charType) to not compare references, but values.

How to replace non-ASCII characters in a sequence?

1 Answers1