0

What is the correct method to insert utf-8 data in an openldap database? I have data in a std::wstring which utf-8 encoded with:

std::wstring converted = boost::locale::conv::to_utf<wchar_t>(line, "Latin1");

When the string needs to added tot an ldapMod structure, i use this fuction:

std::string str8(const std::wstring& s) {
  return boost::locale::conv::utf_to_utf<char>(s);
}

to convert from wstring to string. This is used in my function to create an LDAPMod:

LDAPMod ** y::ldap::server::createMods(dataset& values) {
  LDAPMod ** mods = new LDAPMod*[values.elms() + 1];
  mods[values.elms()] = NULL;

  for(int i = 0; i < values.elms(); i++) {
    mods[i] = new LDAPMod;
    data & d = values.get(i);

    switch (d.getType()) {
      case NEW: mods[i]->mod_op = 0; break;
      case ADD: mods[i]->mod_op = LDAP_MOD_ADD; break;
      case MODIFY: mods[i]->mod_op = LDAP_MOD_REPLACE; break;
      case DELETE: mods[i]->mod_op = LDAP_MOD_DELETE; break;
      default: assert(false);
    }

    std::string type = str8(d.getValue(L"type"));
    mods[i]->mod_type = new char[type.size() + 1];
    std::copy(type.begin(), type.end(), mods[i]->mod_type);
    mods[i]->mod_type[type.size()] = '\0';

    mods[i]->mod_vals.modv_strvals = new char*[d.elms(L"values") + 1];
    for(int j = 0; j < d.elms(L"values"); j++) {
      std::string value = str8(d.getValue(L"values", j));
      mods[i]->mod_vals.modv_strvals[j] = new char[value.size() + 1];
      std::copy(value.begin(), value.end(), mods[i]->mod_vals.modv_strvals[j]);
      mods[i]->mod_vals.modv_strvals[j][value.size()] = '\0';
    }

    mods[i]->mod_vals.modv_strvals[d.elms(L"values")] = NULL;
  }

  return mods;
}

The resulting LDAPMod is passed on to ldap_modify_ext_s and works as long as i only use ASCII characters. But if other characters are present in the string I get an ldap operations error.

I've also tried this with the function provided by the ldap library (ldap_x_wcs_to_utf8s) but the result is the same as with the boost conversion.

It's not the conversion itself that is wrong, because if I convert the modifications back to a std::wstring and show it in my program output, the encoding is still correct.

AFAIK openldap supports utf-8 since long, so I wonder if there's something else that must be done before this works?

I've looked into the openldap client/tools examples, but the utf-8 functions provided by the library are never used in there.

Update: I noticed I can insert utf-8 characters like é into ldap with Apache Directory Studio. I can retrieve these values from ldap in my c++ program. But if I insert the same character again, without changing anything to that string, I get the ldap operations error again.

yvan vander sanden
  • 955
  • 1
  • 12
  • 13

1 Answers1

0

It turns out that my code was not wrong at all. My modifications tried to store the full name in the 'displayName' field as well as in 'gecos'. But apparently 'gecos' cannot handle utf8 data.

We don't actually use gecos anymore. The value was only present because of some software we used years ago, so I removed it from the directory.

What made it hard to find was that even though the loglevel was set to 'parse', this error was still not in the logs.

Because libldap can be such a hard nut to crack, I'll include a link to the complete code of the project i'm working on. It might serve as a starting point for other programmers. (Most of the code in tutorials I have found is outdated.)

https://github.com/yvanvds/yATools/tree/master/libadmintools/ldap

yvan vander sanden
  • 955
  • 1
  • 12
  • 13