I try to read a large file in iOS using NSInputStream to separate the files line by newlines (I don't want to use componentsSeparatedByCharactersInSet
as it uses too much memory).
But as not all lines seem to be UTF-8 encoded (as they can appear just as ASCII, same bytes) I often get the Incorrect NSStringEncoding value 0x0000 detected. Assuming NSASCIIStringEncoding. Will stop this compatiblity mapping behavior in the near future.
warning.
My question is: Is there a way to surpress this warning by e.g. setting a compiler flag?
Furthermore: Is it save to append/concatenate two buffer reads, as reading from the byte stream, then converting the buffer to string and then appending the string could make the string corrupted?
Below an example method that demonstrates that the byte to string conversion will discard the first and second half of the UTF-8 character, as being invalid.
- (void)NSInputStreamTest {
uint8_t testString[] = {0xd0, 0x91}; // @"Б"
// Test 1: Read max 1 byte at a time of UTF-8 string
uint8_t buf1[1], buf2[1];
NSString *s1, *s2, *s3;
NSInteger c1, c2;
NSInputStream *inStream = [[NSInputStream alloc] initWithData:[[NSData alloc] initWithBytes:testString length:2]];
[inStream open];
c1 = [inStream read:buf1 maxLength:1];
s1 = [[NSString alloc] initWithBytes:buf1 length:1 encoding:NSUTF8StringEncoding];
NSLog(@"Test 1: Read %d byte(s): %@", c1, s1);
c2 = [inStream read:buf2 maxLength:1];
s2 = [[NSString alloc] initWithBytes:buf2 length:1 encoding:NSUTF8StringEncoding];
NSLog(@"Test 1: Read %d byte(s): %@", c2, s2);
s3 = [s1 stringByAppendingString:s2];
NSLog(@"Test 1: Concatenated: %@", s3);
[inStream close];
// Test 2: Read max 2 bytes at a time of UTF-8 string
uint8_t buf4[2];
NSString *s4;
NSInteger c4;
NSInputStream *inStream2 = [[NSInputStream alloc] initWithData:[[NSData alloc] initWithBytes:testString length:2]];
[inStream2 open];
c4 = [inStream2 read:buf4 maxLength:2];
s4 = [[NSString alloc] initWithBytes:buf4 length:2 encoding:NSUTF8StringEncoding];
NSLog(@"Test 2: Read %d byte(s): %@", c4, s4);
[inStream2 close];
}
Output:
2013-02-10 21:16:23.412 Test[11144:c07] Test 1: Read 1 byte(s): (null)
2013-02-10 21:16:23.413 Test[11144:c07] Test 1: Read 1 byte(s): (null)
2013-02-10 21:16:23.413 Test[11144:c07] Test 1: Concatenated: (null)
2013-02-10 21:16:23.413 Test[11144:c07] Test 2: Read 2 byte(s): Б