iOS CFStringTransform and Đ

Question

I'm working on an iOS app in which I have to list and sort people names. I've some problem with special character.

I need some clarification on Martin R answer on https://stackoverflow.com/a/15154823/2148377

You could use the CoreFoundation CFStringTransform function which does almost all transformations from your list. Only "đ" and "Đ" have to be handled separately:

Why this particular letter? Where does this come from? Where can I find the documentation?

Thanks a lot.

I don't see any reason why those two characters should be a particular issue. Why not post a comment on Martin R's answer to ask him what the problem is. — JeremyP, May 30 '13 at 13:17
I tried looking and see if those chars weren't in Unicode, but they are - http://en.wiktionary.org/wiki/%C4%91 shows that they have a reference. Maybe it's an iOS issue of some sort? — swiftcode, May 30 '13 at 13:19
Why are you stripping the combining marks when sorting? Sorting should be handled by locale, and different locales have different sorting rules. You should generally sort with `localizedCaseInsensitiveCompare:`. — Rob Napier, May 30 '13 at 13:27
@RobNapier It's what I do for sorting but the problem occur when I want to group names by their first letter. All are converted but not the Đ. To all of you: Thanks for your quick answers! — JonathanGailliez, May 30 '13 at 13:38

Martin R · Accepted Answer · 2013-05-30T13:37:59.050

6

I am not 100% sure, but I think it can be seen from the Unicode Data Base http://www.unicode.org/Public/6.2.0/ucd/UnicodeData.txt.

For example, the entry for "à" is

00E0;LATIN SMALL LETTER A WITH GRAVE;Ll;0;L;0061 0300;;;;N;LATIN SMALL LETTER A GRAVE;;00C0;;00C0

where field #6 is the "Decomposition mapping" into "a" + U+0300 (COMBINING GRAVE ACCENT), therefore

CFStringTransform(..., kCFStringTransformStripCombiningMarks, ...)

transforms "à" into "a".

The entries for "Đ" and "đ" are

0110;LATIN CAPITAL LETTER D WITH STROKE;Lu;0;L;;;;;N;LATIN CAPITAL LETTER D BAR;;;0111;
0111;LATIN SMALL LETTER D WITH STROKE;Ll;0;L;;;;;N;LATIN SMALL LETTER D BAR;;0110;;0110

where field #6 is empty, so these characters do not have a decomposition into a "base character" and a "combining mark".

So the question remains: Which standard determines that a "normalized form" of "đ / Đ" is "d / D"?

edited May 30 '13 at 13:37

answered May 30 '13 at 13:27

Martin R

529,903
94
1,240
1,382

Thanks a lot for this quick illuminating response I will investigate to see if there is other letter that may cause problem. – JonathanGailliez May 30 '13 at 13:40
> So the question remains: Which standard determines that a "normalized form" of "đ / Đ" is "d / D"? I really don't know since these letter is used in many language in different part of the world. In my case it's good to normalize that way but it can be otherwise for other purpose. – JonathanGailliez May 30 '13 at 14:01

iOS CFStringTransform and Đ

1 Answers1

Linked