2

I'm trying to parse some HTML. I use stringWithContentsOfURL to get the HTML. I attempt to load this into a character array so I can parse it, but I crash with the EXC_BAD_ACCESS error when getCString is called. Here is the relavent code:

- (void)parseStoryWithURL:(NSURL *)storyURL
{
    _paragraphs = [[NSMutableArray alloc] initWithCapacity:10];
    _read = NO;

    NSError* error = nil;
    NSString* originalFeed = [NSString stringWithContentsOfURL:storyURL encoding:NSUTF8StringEncoding error:&error];

    _i = originalFeed.length;
   char* entireFeed = malloc(_i*sizeof(char));
   char* current = entireFeed;
   char* lagger;
   char* recentChars = malloc(7);
   BOOL collectRecent = NO;
   BOOL paragraphStarted = NO;
   BOOL paragraphEnded = NO;
   int recentIndex = 0;
   int paragraphSize = 0;

   NSLog(@"original Feed: %@", originalFeed);


   _read = [originalFeed getCString:*entireFeed maxLength:_i encoding:NSUTF8StringEncoding];

I've also tried this passing the 'current' pointer to getCString but it behaves the same. From what I've read this error is typically thrown when you try to read from deallocated memory. I'm programming for iOS 5 with memory management. The line before that I print the HTML to the log and everything is fine. Help would be appreciated. I need to get past this error so I can test/debug my HTML parsing algorithms.

PS: If someone with enough reputation is allowed to, please add "getCString" as a tag. Apparently no one uses this function :(

evanmcdonnal
  • 46,131
  • 16
  • 104
  • 115

3 Answers3

1

A few strange things;

char* entireFeed[_i]; allocates an array of char*, not an array of char. I suspect you want char entireFeed[_i] or char *entireFeed = malloc(_i*sizeof(char));

getCString takes a char* as a first parameter, that is, you should send it entireFeed instead of *entireFeed.

Also, note that the (UTF-8) encoding may add bytes to the result, so allocating the buffer the exact size of the input may cause the method to return NO (buffer too small). You should really use [originalFeed UTF8String] instead.

Joachim Isaksson
  • 176,943
  • 25
  • 281
  • 294
  • malloc is a start but I think the buffer is to small. getCString is returning NO. – evanmcdonnal Feb 05 '12 at 22:18
  • @evanmcdonnal Yes, I got that part wrong, it won't truncate the output, it will just fail with NO. Any particular reason not to just use the UTF8String method? – Joachim Isaksson Feb 05 '12 at 22:25
  • No, I just didn't know about it. I'm new to iOS/Cocoa and am most acclimated to processing strings with char arrays in C++. I'm probably going to try doing it with UTF8 now, but I need to think about it for a minute. – evanmcdonnal Feb 05 '12 at 22:48
  • @evanmcdonnal NSString's UTF8String method returns exactly that, a const char*. – Joachim Isaksson Feb 05 '12 at 22:51
1

Try explicitly malloc'ing entireFeed with a length of _i (not 100% certain of this, as NSUTF8String might also include double byte unichars or wchars) instead of the wacky char * entireFeed[_i] thing you're doing.

I can't imagine char * entireFeed[_i] is working at run-time (and instead, you're passing a NULL pointer to your getCString method).

Michael Dautermann
  • 88,797
  • 17
  • 166
  • 215
1

There are several issues with your code - you're passing the wrong pointers and not reserving enough space. Probably the easiest is to use UTF8String instead:

char *entireFeed = strdup([originalFeed UTF8String]);

At the end you'll have to free the string with free(entireFeed) though. If you don't modify it you can use

const char *entireFeed = [originalFeed UTF8String];

directly.

If you want to use getCString, you'll need to determine the length first - which has to include the termination character as well as extra space for encoded characters, so something like:

NSUInteger len = [originalFeed lengthOfBytesUsingEncoding: NSUTF8StringEncoding] + 1;
char entireFeed[len];
[originalFeed getCString:entireFeed maxLength:len encoding:NSUTF8StringEncoding];
Simon Urbanek
  • 13,842
  • 45
  • 45
  • Yeah I've made some corrections to my code. I now use malloc, but the getCString call returns NO, so I'm guessing the buffer is too small. If I use UTF8String, can I walk and process it with char pointers like I could a char array? – evanmcdonnal Feb 05 '12 at 22:16
  • Yes, it is a char array except that you can't write to it. Your _i is still wrong (length does *not* work and you're missing space for the terminating NUL) - it won't fit - see my answer - you cannot use `length` (it counts Unicode characters not bytes), you must use `lengthOfBytesUsingEncoding` and add one for the terminating NUL. – Simon Urbanek Feb 05 '12 at 22:21