11

I'm running the following code in the viewDidLoad function of a vanilla iPad single view app:

/*
 *  Print the string.  A lot.
 */
for (int i = 0; i < 300; i++) {
    NSLog(@"%d\n", i);
    NSLog(@"⊢ ⊣ ⊥ ⊻ ⊼ ⊂ ⊃ ⊑ ⊒ \n");
}

The output looks like this:

2013-02-04 20:17:49.718 testplay[59585:c07] 228
2013-02-04 20:17:49.718 testplay[59585:c07] ⊢ ⊣ ⊥ ⊻ ⊼ ⊂ ⊃ ⊑ ⊒ 
2013-02-04 20:17:49.719 testplay[59585:c07] 229
2013-02-04 20:17:49.719 testplay[59585:c07] ⊢ ⊣ ⊥ ⊻ ⊼ ⊂ ⊃ ⊑ ⊒ 
2013-02-04 20:17:49.719 testplay[59585:c07] 230
2013-02-04 20:17:49.720 testplay[59585:c07] ⊢ ⊣ ⊥ ⊻ ⊼ ⊂ ⊃ ⊑ ⊒ 
2013-02-04 20:17:49.720 testplay[59585:c07] 231
2013-02-04 20:17:49.720 testplay[59585:c07] ⊢ ⊣ ⊥ ⊻ ⊼ ⊂ ⊃ \342\212\221 ⊒ 
2013-02-04 20:17:49.723 testplay[59585:c07] 232
2013-02-04 20:17:49.724 testplay[59585:c07] ⊢ ⊣ ⊥ ⊻ ⊼ ⊂ ⊃ ⊑ ⊒ 

The octal happens almost always on the same character and the hiccup occurs randomly about 3 times per run.

While it's relatively harmless in NSLog(), it implies that unicode characters may be dealt with irregularly at some level. If there's a history to this behavior or some resource I could be looking at that would be great.

[addendum: Removed reference to how I came across this problem. Understanding why and how NSLog gets a corrupted read of a unicode character is my hope here.]

Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
Tim Kolar
  • 175
  • 1
  • 9
  • The NSLog behavior is interesting (and reproducible). However changing the code to comparison iterations do not cause mismatched strings. It would be helpful if you posted the comparison method that is actually causing you issues (`compare:`, `isEqualToString:`, `compare:options:`, etc) – Fruity Geek Feb 05 '13 at 06:18

2 Answers2

8

Short version:

I think this happens if a UTF-8 sequence of an NSLog() output happens to fall on the boundary of the buffer of the pseudo-terminal that Xcode uses for standard error of the debugged process.

If my assumption is correct, this is only a problem of the Xcode debugger output and does not imply any Unicode problems in the application.

Long version:

If you run your app in the simulator, lsof -p <pid_of_simulated_app> shows that the standard error (file descriptor 2) is redirected to a pseudo-terminal:

# lsof -p 3251
...
testplay 3251 martin    2w     CHR               16,2     0t131     905 /dev/ttys002
...

And lsof -p <pid_of_Xcode> shows that Xcode has the same pseudo-terminal open:

# lsof -p 3202
...
Xcode   3202 martin   51u     CHR               16,2       0t0     905 /dev/ttys002
...

NSLog() writes to standard error. With the system call tracer "dtruss" one can see that Xcode reads the log message from the pseudo-terminal. For a single log message

NSLog(@"⊢ ⊣ ⊥ ⊻ ⊼ ⊂ ⊃ ⊑ ⊒ \n");

it looks like this:

# dtruss -n Xcode -t read_nocancel
 3202/0xe101:  read_nocancel(0x31, "2013-02-05 08:57:44.744 testplay[3251:11303] \342\212\242 \342\212\243 ... \342\212\222 \n\0", 0x8000)       = 82 0

But for many NSLog() statements following each other rapidly, sometimes the following happens:

# dtruss -n Xcode -t read_nocancel
...
 3202/0xd828:  read_nocancel(0x33, "2013-02-05 08:39:51.156 ...", 0x8000) = 1024 0
 3202/0xd87b:  read_nocancel(0x33, "\212\273 \342\212\274 ...", 0x8000) = 24 0

As you can see, Xcode has read 1024 bytes from the pseudo-terminal, and the next read starts with an incomplete UTF-8 sequence. In this case, Xcode "does not see" that the last byte of the first read and the first two bytes of the second read are parts of the same UTF-8 sequence. I assume that Xcode treats all 3 bytes as invalid UTF-8 sequences and prints them as octal numbers.

Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • That's a very compelling explanation. It doesn't quite explain why it is usually the same character but I think that can be safely chalked up to the vagaries of buffering and process timing. Thanks! – Tim Kolar Feb 05 '13 at 13:49
0

A workaround, in your loop, put a "fflush(stderr);" after the second NSLog statement; This will force stderr to commit and write the buffer before continuing.