I'm porting my JNA-based library to "pure" Java using the Foreign Function and Memory API ([JEP 424][1]) in JDK 19.
One frequent use case my library handles is reading (null-terminated) Strings from native memory. For most *nix applications, these are "C Strings" and the MemorySegment.getUtf8String() method is sufficient to the task.
Native Windows Strings, however, are stored in UTF-16 (LE). Referenced as arrays of TCHAR
or as "Wide Strings" they are treated similarly to "C Strings" except consume 2 bytes each.
JNA provides a Native.getWideString()
method for this purpose which invokes native code to efficiently iterate over the appropriate character set.
I don't see a UTF-16 equivalent to the getUtf8String()
(and corresponding set...()
) optimized for these Windows-based applications.
I can work around the problem with a few approaches:
- If I'm reading from a fixed size buffer, I can create a
new String(bytes, StandardCharsets.UTF_16LE)
and:- If I know the memory was cleared before being filled, use
trim()
- Otherwise
split()
on the null delimiter and extract the first element
- If I know the memory was cleared before being filled, use
- If I'm just reading from a pointer offset with no knowledge of the total size (or a very large total size I don't want to instantiate into a
byte[]
) I can iterate character-by-character looking for the null.
While certainly I wouldn't expect the JDK to provide native implementations for every character set, I would think that Windows represents a significant enough usage share to support its primary native encoding alongside the UTF-8 convenience methods. Is there a method to do this that I haven't discovered yet? Or are there any better alternatives than the new String()
or character-based iteration approaches I've described?