0

This has me stumped. I'm trying to read a 6MB CSV file on iOS line by line. I've tried using plain C file pointers and NSInputStream polling but settled on the following which felt the cleanest. All three approaches result in what seems like a random block of reads returning success but fill the buffer with all null bytes. I say "random" but it has consistency. The reads stop working at the exact same point when re-running the program, and the number of reads is suspicious (more on that below).

- (id)initWithFileAtPath:(NSString *)path {
   if ((self = [super init])) {
      filePath = [path copy];
      queue = [[NSOperationQueue alloc] init];
      queue.maxConcurrentOperationCount = 1;
      buffer = [[NSMutableString alloc] init];
      bytes = malloc(CHUNK_SIZE * sizeof(UTF8Char));
   }

   return self;
}

- (void)dealloc {
   [filePath release];
   [queue release];
   [buffer release];
   free(bytes);
   [super dealloc];
}

- (void)stream:(NSInputStream *)stream handleEvent:(NSStreamEvent)eventCode {
   switch (eventCode) {
      case NSStreamEventOpenCompleted:
         break;
      case NSStreamEventHasBytesAvailable:
         [queue addOperationWithBlock:^{
            [self readChunk: stream];
            [self drainBuffer];
         }];
         break;
      case NSStreamEventEndEncountered:
         if ([buffer length] > 0) {
            [delegate reader:self didReadLine:[NSString stringWithString:buffer]];
            [buffer setString:@""];
         }

         [stream close];
         [stream removeFromRunLoop:[NSRunLoop currentRunLoop]
                           forMode:NSDefaultRunLoopMode];

         [stream release];

         [delegate readerDidFinishReading:self];

         break;
      default:
         NSLog(@"StreamReader: event %d", eventCode);
         break;
   }
}

- (void)enumerateLines {
   NSInputStream *stream = [[NSInputStream alloc] initWithFileAtPath:filePath];
   stream.delegate = self;

   [stream scheduleInRunLoop:[NSRunLoop currentRunLoop]
                     forMode:NSDefaultRunLoopMode];

   [stream open];
}

- (void)readChunk: (NSInputStream*)stream {
   NSInteger readSize = [stream read:bytes maxLength:CHUNK_SIZE];
   if (readSize) {
      if (bytes[0] == '\0') {
         NSLog(@"null buffer %d", readSize);
      }
      NSString *string = [[NSString alloc] initWithBytes:bytes
                                                  length:readSize
                                                encoding:NSUTF8StringEncoding];
      [buffer appendString:string];
      [string release];
   } else {
      NSLog(@"StreamReader: read zero bytes");
   }
}

- (void)drainBuffer {
   static NSCharacterSet *newlines = nil;
   if (newlines == nil) {
      newlines = [NSCharacterSet newlineCharacterSet];
   }

   NSRange newlinePos;
   while ((newlinePos = [buffer rangeOfCharacterFromSet:newlines]).location != NSNotFound) {
      NSString *line = [buffer substringToIndex:newlinePos.location];

      // remove the line from the buffer along with line separator
      [buffer deleteCharactersInRange: (NSRange){0, [line length]}];
      while ([buffer length] > 0 && [newlines characterIsMember:[buffer characterAtIndex:0]]) {
         [buffer deleteCharactersInRange:(NSRange){0, 1}];
      }

      [delegate reader:self didReadLine: line];
   }
}

While reading the 6MB file, twice I will get a series of 96 "bad reads" when CHUNK_SIZE is 1024. If CHUNK_SIZE is 512 there will be a series of 192 "bad reads". What do I mean by "bad reads"? The NSInputStream read message returns success, and no error event occurs in the delegate callback. Yet the bytes buffer has all null values.

  • iOS 7.0.4, iPad 2
  • does NOT happen on desktop
  • does NOT happen in simulator
  • decreasing file size to aprox. 1MB "fixes" the problem on the iPad

It's most likely worth noting that I instantiate the reader class while on the main UI thread.

So... am I doing something subtly (or not subtly) wrong here? Or have I uncovered some sort of obscure iOS bug?

Josh Kropf
  • 21
  • 1
  • I'd add oodles of NSLog statements, and plod through them trying to find my bug. The odds of the file system having serious problems at this point are extremely remote. – David H Jan 26 '14 at 23:02

1 Answers1

0

At least one problem is you are reading random chunks of a UTF8 stream, then assuming that what you get back is coherent. If you get a chunk of string that "breaks" in the middle of a UTF8 encoding, its going to cause a slew of problems. If you want to do partial strings construction your algorithm is going to need rework to prevent this - its not a trivial thing to do.

David H
  • 40,852
  • 12
  • 92
  • 138
  • To eliminate that as a culprit for this problem I converted the file to ascii removing any non-convertalbe characters with `iconv -f utf8 -t ascii -c myfile.csv >clean.csv`. The behavior I describe in my post is while using a file with ascii characters. – Josh Kropf Jan 27 '14 at 02:42
  • Why are you using a NSOperation queue and not just a serial dispatch queue? NSOperation queues allow multiple operations to occur. Your current solution just seems odd. Anyway, as a last suggestion, put the file on a public site like Dropbox, spin up a simple demo app that just downloads (but has the problem on an iPad), and I will look at it (maybe someone else will too). It pays to build up points on this site as a 50 or 100 pt bouncy will go a long way to motivating people. I only have an iPad 3 so it may or may not work out of the box. – David H Jan 27 '14 at 16:39