1

I have the current code and it does seem to work except for the fact CFShow doesn't translate the unicode UTF8 encoding of \u00e9 to é

#include <CoreFoundation/CoreFoundation.h>

int main()
{

    char *s = "This is a test of unicode support: fiancée\n";
    CFTypeRef cfs = CFStringCreateWithCString(NULL, s, kCFStringEncodingUTF8);
    CFShow(cfs);

}

Output is

This is a test of unicode support: fianc\u00e9e
                                        |____|
                                           > é doesn't output properly.

How do I instruct CFShow that it is unicode? printf handles it fine when it is a c string.

Zimm3r
  • 3,369
  • 5
  • 35
  • 53
  • If `CFShow` didn't already believe that the string is Unicode, it wouldn't be interpreting the characters as such and using `\u` escapes. You would see stuff like `\xC3\xA9`. – Peter Hosey Sep 07 '13 at 23:19

2 Answers2

5

CFShow() is only for debugging. It's deliberately converting non-ASCII to escape codes in order to avoid ambiguity. For example, "é" can be expressed in two ways: as U+00E9 LATIN SMALL LETTER E WITH ACUTE or as U+0065 LATIN SMALL LETTER E followed by U+0301 COMBINING ACUTE ACCENT. If CFShow() were to emit the UTF-8 sequence, your terminal would likely present it as "é" and you wouldn't be able to tell which variant was in the string. That would undermine the usefulness of CFShow() for debugging.

Why do you care what the output of CFShow() so long as it you understand what the content of the string is?

Ken Thomases
  • 88,520
  • 7
  • 116
  • 154
  • To use it as output to the console without having to resort to C libraries and using printf. So a companion to a way to output to the console with Win API functions using Mac OS X API Core Foundation. – Zimm3r Sep 07 '13 at 16:19
  • @Zimm3r: That's not what `CFShow` is for. `CFShow` is for peeking at a string's contents for debugging purposes, not for regular output. You should ask a separate question about the correct way to write a CFString to stdout or stderr. Send me the link and I'll answer it. – Peter Hosey Sep 07 '13 at 23:18
  • `CFShow()` is not suitable for general printing of strings (or anything else). As I say, it's a debugging function. If you're determined to avoid C libraries and stdio, you can use `CFStringCreateExternalRepresentation()` or `CFStringGetCString()` to get a UTF-8-encoded byte buffer and use `write()` or the like to output it. – Ken Thomases Sep 07 '13 at 23:20
1

It appears to me that CFShow knows that the string is Unicode, but doesn't know how to format Unicode for the console. I doubt that you can do anything but look for an alternative, perhaps NSLog.

JWWalker
  • 22,385
  • 6
  • 55
  • 76
  • NSLog is objective c though isn't it? I'm trying to stick to C and C++ so I can easily use it in assembly (something Apple seems to shun). – Zimm3r Sep 07 '13 at 16:21
  • Yes, `NSLog` is Objective-C. But you could write a wrapper function that takes a CFStringRef and passes it to `NSLog`, and thus confine the Objective-C to one little file. – JWWalker Sep 07 '13 at 16:30
  • I'm trying to avoid objective c, which seems impossible sense apple uses it so much sadly. Oh well now trying to find how to call it inside assembly; looks like lots og objc_msgSend calls. – Zimm3r Sep 07 '13 at 16:43
  • Now that I look at it, the declaration is `extern "C" void NSLog(NSString *format, ...)`. So if you know how to call a C function with a variable number of arguments in assembly, you should be all set. – JWWalker Sep 07 '13 at 21:38
  • @Zimm3r: `NSLog` is a C function provided by the Foundation framework. Just because *most* of the framework is Objective-C classes doesn't mean it's nothing but. There are actually [rather a lot of Foundation functions](https://developer.apple.com/library/mac/documentation/Cocoa/Reference/Foundation/Miscellaneous/Foundation_Functions/). – Peter Hosey Sep 07 '13 at 23:21