0

I wrote a small test program (see the end of this post) that uses libicu's uidna_IDNToASCII function to punycode a Unicode domain name.

$ g++ -std=c++11 -W -Wall test.cpp -licucore
$ ./a.out EXAMΠLE.com
xn--examle-s0e.com
$ ./a.out EXAMPLE.com
EXAMPLE.com

Punycoder.com confirms that xn--examle-s0e.com is the punycode for examπle.com (with the Greek and the ASCII both lowercased). But when I give my program the pure-ASCII EXAMPLE.com, libicu fails to lowercase any of it!

How can I convince libicu to lowercase pure-ASCII domain names too?

Here's the complete C++11 source code I'm using:

#include <cstdio>
#include <string>
#include <unicode/uidna.h>
#include <unicode/ustring.h>
#include <vector>

std::string convert_utf8_to_idna(const std::string& input) {
    UErrorCode err = U_ZERO_ERROR;
    std::int32_t needed = 0;

    auto src = std::vector<UChar>(1000);
    (void)u_strFromUTF8WithSub(
        src.data(), src.size(), &needed,
        input.data(), input.size(),
        0xFFFD, nullptr, &err
    );
    src.resize(needed); // chop off the unused excess
    assert(err == U_ZERO_ERROR);

    auto dest = std::vector<UChar>(1000);
    needed = uidna_IDNToASCII(
        src.data(), src.size(),
        dest.data(), dest.size(),
        UIDNA_ALLOW_UNASSIGNED, nullptr, &err
    );
    assert(err == U_ZERO_ERROR);
    dest.resize(needed); // chop off the unused excess

    return std::string(dest.begin(), dest.end());
}

int main(int argc, char **argv) {
    std::string input = (argc >= 2) ? argv[1] : "example.com";
    std::string output = convert_utf8_to_idna(input);
    printf("%s\n", output.c_str());
}
Quuxplusone
  • 23,928
  • 8
  • 94
  • 159
  • I'm also stuck on a 10-year-old libicu4c release. It looks like `uidna_IDNToASCII` is deprecated in newer releases? but I'm not sure what I'm supposed to replace it with. If the answer is "switch to this other thing that also incidentally solves the lowercasing issue," that'd be cool. – Quuxplusone Mar 21 '23 at 21:29
  • The java version marks internal functions as deprecated, so maybe the C version does the same? In that case, I'd guess you can replace it with a public function that calls it. – arnt Aug 28 '23 at 09:02

0 Answers0