3

Can someone please tell me how to save the byte order marker (BOM) with a file? For example, I save a text file now like:

NSString *currentFileContent = @"This is a string of text to represent file content.";
NSString *currentFileName = @"MyFileName";
NSString *filePath = [NSString stringWithFormat:@"%@/%@.%@", [self documentsDirectory], currentFileName, rtf];
[currentFileContent writeToFile:filePath atomically:YES encoding:NSUTF16StringEncoding error:&error];

My understanding of BOM is:

The BOM character is the "ZERO WIDTH NO-BREAK SPACE" character, U+FEFF, in the Unicode character set.

I have an iPhone application that allows users to save text to an RTF file. All works fine if I use NSUTF8StringEncoding, unless the user has double-byte characters such as Japanese, or Chinese. The simple answer would seem to be saving the file with NSUTF16StringEncoding which is allowed in more recent RTF specs, except that Microsoft Word can only automatically open UTF-16 files if a BOM is defined.

My hope is that If I can set a generic BOM, I won't need to identify the user's character set as I have no way to know what that is in advance. But they will still be able to open RTF Files with double-byte characters.

Thanks for suggestions or insight.

DenVog
  • 4,226
  • 3
  • 43
  • 72

1 Answers1

1

If you first convert the string to data, and then write the data to file, then NSString's dataUsingEncoding:allowLossyConversion: will add the BOM for you. You can write your file as follows:

NSData *data = [currentFileContent dataUsingEncoding:NSUTF16StringEncoding];
[data writeToFile:filePath options:NSDataWritingAtomic error:&error];
ughoavgfhw
  • 39,734
  • 6
  • 101
  • 123
  • Thanks for the suggestion. I see in the NSString Class Reference "This method creates an external representation (with a byte order marker, if necessary, to indicate endianness)". So I will mark this answered. Unfortunately, my core problem is not solved. When I open the file in Word, it still shows the RTF markup. I can see in TextWrangler it is saving as UTF-16 Little-Endian, but not sure what BOM is getting defined if any. – DenVog Jun 24 '11 at 16:30