If you're open to using a third-party library that works with Java 8 or above, Eclipse Collections (EC) can solve this problem using a primitive Bag
to count characters. Use a CharBag
if char
values are required, or an IntBag
if codePoints (int
values) are required. A Bag
is a simpler data structure for counting things and may be backed by a primitive HashMap
so as not to box the counts as Integer
or Long
objects. A Bag
doesn't suffer from the missing keys return null
values problem that a HashMap
does in Java.
@Test
public void characterCountJava8()
{
String word = "AAABBB";
CharAdapter chars = Strings.asChars(word);
CharBag charCounts = chars.toBag();
Assertions.assertEquals(3, charCounts.occurrencesOf('A'));
Assertions.assertEquals(3, charCounts.occurrencesOf('B'));
Assertions.assertEquals(0, charCounts.occurrencesOf('C'));
System.out.println(charCounts.toStringOfItemToCount());
}
Outputs:
{A=3, B=3}
CharAdapter
and CharBag
are primitive collection types available in EC. A CharBag
is useful if you want to count char
values. Notice that the charCounts.occurrencesOf('C')
returns 0
instead of null
as it would if this was a HashMap
.
The following example shows using codePoints that are visually appealing using emojis. The code itself will work with Java 8, but I believe the Emoji literal support wasn't added until Java 11.
@Test
public void codePointCountJava11()
{
String emojis = "";
CodePointAdapter codePoints = Strings.asCodePoints(emojis);
IntBag emojiCounts = codePoints.toBag();
int appleInt = "".codePointAt(0);
int bananaInt = "".codePointAt(0);
int pearInt = "".codePointAt(0);
Assertions.assertEquals(3, emojiCounts.occurrencesOf(appleInt));
Assertions.assertEquals(2, emojiCounts.occurrencesOf(bananaInt));
Assertions.assertEquals(0, emojiCounts.occurrencesOf(pearInt));
System.out.println(emojiCounts.toStringOfItemToCount());
Bag<String> emojiStringCounts = emojiCounts.collect(Character::toString);
System.out.println(emojiStringCounts.toStringOfItemToCount());
}
Outputs:
{127820=2, 127822=3} // IntBag.toStringOfItemToCount()
{=2, =3} // Bag<String>.toStringOfItemToCount()
CodePointAdapter
and IntBag
are primitive collection types available in EC. An IntBag
is useful if you want to count int
values. Notice that the emojiCounts.occurrencesOf(pearInt)
returns 0
instead of null
as it would if this was a HashMap
.
I converted the IntBag
to a Bag<String>
to show the differences when printing int
vs. char
. You need to convert int
codePoints back to String
if you want to print anything.
The comment Holger left on the accepted answer about grapheme clusters was insightful and helpful. Thank you! The codepoint solution here suffers from the same issue as all of the other codepoint solutions.
Eclipse Collections 11.1 was compiled and released with Java 8. I wouldn't recommend staying on Java 8 any more, but wanted to point out this is still possible.
Note: I am a committer for Eclipse Collections.