Is running a nested NSScanner the most efficient method for parsing out a string of repeating elements or can the scanning be done in one pass?
I have a string which is returned from a command line call (NSTAsk
) to Apple's Compressor (there are no line breaks, breaks are in purely for ease of this question being legible without scrolling):
<jobStatus name="compressor.motn" submissionTime="12/4/10 3:56:16 PM"
sentBy="localuser" jobType="Compressor" priority="HighPriority"
timeElapsed="32 second(s)" timeRemaining="0" timeElapsedSeconds="32"
timeRemainingSeconds="0" percentComplete="100" resumePercentComplete="100"
status="Successful" jobid="CD4046D8-CDC1-4F2D-B9A8-460DF6AF184E"
batchid="0C9041F5-A499-4D00-A26A-D7508EAF3F85" /jobStatus>
These repeat in the same string thus there could be zero through n of these in the return string:
<jobstatus .... /jobstatus><jobstatus .... /jobstatus>
<jobstatus .... /jobstatus>
In addition there could be other tags included which are of no significance to my code (batchstatus in this example):
<jobstatus .... /jobstatus><batchstatus .... /batchstatus>
<jobstatus .... /jobstatus>
This is NOT an XML document that gets returned, merely a series of blocks of status which happen to be wrapped in an XML like tag. None of the blocks are nested. They are all sequential in nature. I have no control over the data being returned.
My goal (and currently working code) parses the string into "jobs" that contain dictionaries of the details within a jobstatus block. Any other blocks (such as batchstatus) and any other strings are ignored. I am only concerned with the contents of the jobstatus blocks.
NSScanner * jobScanner = [NSScanner scannerWithString:dataAsString];
NSScanner * detailScanner = nil;
NSMutableDictionary * jobDictionary = [NSMutableDictionary dictionary];
NSMutableArray * jobsArray = [NSMutableArray array];
NSString * key = @"";
NSString * value = @"";
NSString * jobStatus = @"";
NSCharacterSet * whitespace = [NSCharacterSet whitespaceCharacterSet];
while ([jobScanner isAtEnd] == NO) {
if ([jobScanner scanUpToString:@"<jobstatus " intoString:NULL] &&
[jobScanner scanUpToCharactersFromSet:whitespace intoString:NULL] &&
[jobScanner scanUpToString:@" /jobstatus>" intoString:&jobStatus]) {
detailScanner = [NSScanner scannerWithString:jobStatus];
[jobDictionary removeAllObjects];
while ([detailScanner isAtEnd] == NO) {
if ([detailScanner scanUpToString:@"=" intoString:&key] &&
[detailScanner scanString:@"=\"" intoString:NULL] &&
[detailScanner scanUpToString:@"\"" intoString:&value] &&
[detailScanner scanString:@"\"" intoString:NULL]) {
[jobDictionary setObject:value forKey:key];
//NSLog(@"Key:(%@) Value:(%@)", key, value);
}
}
[jobsArray addObject:
[NSDictionary dictionaryWithDictionary:jobDictionary]];
}
}
NSLog(@"Jobs Dictionary:%@", jobsArray);
The above code produces the following log output:
Jobs Dictionary:(
{
batchid = "0C9041F5-A499-4D00-A26A-D7508EAF3F85";
jobType = Compressor;
jobid = "CD4046D8-CDC1-4F2D-B9A8-460DF6AF184E";
name = "compressor.motn";
percentComplete = 100;
priority = HighPriority;
resumePercentComplete = 100;
sentBy = localuser;
status = Successful;
submissionTime = "12/4/10 3:56:16 PM";
timeElapsed = "32 second(s)";
timeElapsedSeconds = 32;
timeRemaining = 0;
timeRemainingSeconds = 0;
}
Here's the concern. In my code I am scanning through the string and then when I get a block of data, scanning through that piece to create a dictionary that populates an array. This effectively means the string gets walked twice. As this is something that happens every 15 - 30 seconds or so and could contain hundreds of jobs, I see this as a potential CPU and memory hog and being as the app running this could be on the same machine as the Compressor app (which is already a memory and CPU hog) - I don't want to add any burden if I don't have to.
Is there a better way that I should be using NSScanner as I walk through it to get the data?
Any advice or recommendation much appreciated!