For Perl, don't use the cmp
operator. Instead, you should be using the Unicode::Collate
module:
use Unicode::Collate;
sub compare_strs
{
my ( $str1, $str2 ) = @_;
# Treat vars as strings by quoting.
# Possibly incorrect/irrelevant approach.
return $Collator->cmp("$str1", "$str2");
}
If you're worried about normalization (e.g., order of combining marks), you can also use the Unicode::Normalize
module.
In Java, use the Collator
class, as described in the tutorial on comparing strings. For normalization, see the tutorial on normalizing text. The required classes were introduced in Java 1.6; if you need to support earlier versions of Java, you will need to use something like the ICU libraries.
Using the appropriate tools as described above should ensure that both environments behave according to the Unicode collation algorithm (and hence compatibly with one another).