Eclipse IDE processing emojis using surrogate pairs

Question

I am not able to find a clear answer to this. Does the ECLIPSE IDE support emojis? I have read a lot about surrogate pairs here on stack overflow, but I am unable to get a clear answer on this.

I am having to read in a text file character by character and I am using FileInputStream.

Would it be possible to process the emojis using surrogate pairs? I am wanting to use a select few apple emojis. These specifically: By process them, I mean I would like to identify them as that particular emoji when reading in the file.

If so, could someone show me an example?

Short answer: Yes, it’s possible. What does “process the emojis” mean? What do you want to do with them? — VGR, Nov 01 '16 at 19:32
By processing them, I mean that I would like to be able to identify them individually and return something based on which emoji it is. — Wanda, Nov 01 '16 at 19:44

score 1 · Accepted Answer · answered Nov 01 '16 at 20:36

InputStreams are for reading bytes; Readers are for reading characters. So you should use a Reader obtained from Files.newBufferedReader, or use a FileReader or InputStreamReader.

Although Java uses surrogate pairs inside a String to represent emojis and many other types of Unicode characters, you don’t need to deal with surrogate pairs directly. Surrogate values only exist because many character values are too large for a Java char type. If you read individual characters as int values (for example, with the CharSequence.codePoints method), you will get whole character values every time, and you will never see or have to deal with a surrogate value.

As of this writing, emojis are defined by Unicode to be in the Emoticons block, part of the Supplemental Symbols and Pictographs block, and three legacy characters in the Miscellaneous Symbols block.

Thus, using a BufferedReader and traversing the character data with ints might look like this:

try (BufferedReader reader =
    Files.newBufferedReader(Paths.get(filename), Charset.defaultCharset())) {

    IntStream chars = reader.lines().flatMapToInt(String::codePoints);
    chars.forEachOrdered(c -> {
        if ((c >= 0x2639 && c <= 0x263b) ||
            (c >= 0x1f600 && c < 0x1f650) ||
            (c >= 0x1f910 && c < 0x1f930)) {

            processEmoji(c);
        }
    });
}

Thank you very much, this helped a lot. I appreciate your detailed response. I had a feeling I wasn't using the correct file reader. — Wanda, Nov 01 '16 at 21:33

Eclipse IDE processing emojis using surrogate pairs

1 Answers1