0

I would like to parse a string like this:
NSString *str = @"firstcolumn second column text Third Column Text";

I have three columns of text, each column could be text with spaces.
I know how wide the columns, col1 = 10 chars long, col2 = 20, col3 = 30
I know I could use NSRange(0,len1),(10,len2),(20,len3).

I get crashes 'Out of range" errors because the length varies, the length of the column text doesn't have to reach its max limit.

Any ideas how to do this?

NSString *str = @"A000 B11 This is text description This column is a longer Text description"; 
//A000 column can be 10 chars long
//B11 can be 20 chars
//This is some text description can be 30 characters long
 NSString *code1 = [line substringWithRange:NSMakeRange(0,10)];
 NSString *code2 = [line substringWithRange:NSMakeRange(10,20)];
 NSString *shorttext = [line substringWithRange:NSMakeRange(20,20)];
 NSString *longtext = [line substringWithRange:NSMakeRange(30,30)];

I would like to get code1 = A000 in the above example, this can be of length 10 chars long, but don't have to be as you can see. Same, thing goes for the other 2 columns, code2, and text. How can I do this?

SMA2012
  • 161
  • 1
  • 3
  • 9
  • If you're crashing, can you share the code so someone can help you? – bryanmac Nov 07 '12 at 00:05
  • So a column is not the same length every time? And you do have spaces within the columns text? And you dont have any separators? – calimarkus Nov 07 '12 at 00:06
  • What you are talking about is only valid if the last column can be variable length. If the first 2 columns can be variable length then you must have some delimiter between each column. – rmaddy Nov 07 '12 at 00:07
  • I added some pseudo code in the original post. Hope it clears it up a bit. – SMA2012 Nov 07 '12 at 00:21
  • You just confirmed the problem. The first two columns are not fixed length so you can't use a range. Can the 1st two columns have spaces? If not then this is doable. If the first two columns can each have spaces and they are not fixed length then this is impossible without some column delimiter. – rmaddy Nov 07 '12 at 00:24
  • @rmaddy, you are correct. But, is there a way I can load up the string into buffer/stream, then go through it to parse it out? The first 2 columns will have no spaces, but the problem I have 4 columns, 3rd column is a brief description and 4th column is full text description. – SMA2012 Nov 07 '12 at 00:28
  • If the first two have no spaces then that is easy. But what separates columns 3 and 4? And why are just now bringing up that there is a 4th column? If you want good help, you need to provide good, complete information up front. – rmaddy Nov 07 '12 at 00:30
  • I thought the problem was in NSRange, I thought I could solve the problem by showing small snippet of pseudo code. I edited the post with extra 4th column. Just space to separate the columns. I just know the max length of each column. – SMA2012 Nov 07 '12 at 00:35

1 Answers1

1

If I understand correctly, you have an input NSString str which consists of three concatenated strings: col1, col2, and col3. Additionally, you know the following constraints about the problem

  • col1 is between 0 and 10 characters
  • col2 is between 0 and 20 characters
  • col3 is between 0 and 30 characters

and want to recover these strings from str. Put differently, you want to uniquely determine col1, col2, and col3 so that str is equal to

[NSString stringWithFormat:@"%@%@%@", col1, col2, col3];

Unfortunately, as others have commented, this is not possible without modifying the problem. To see why not, consider the case where

str = @"a";

In this case, you know that one of the component strings (col1, col2, or col3) is equal to @"a" and the other two are equal to @"". However, it's not possible to determine which. If, for example col1 = @"a" and col2 and col3 are both equal to @""; then

[NSString stringWithFormat:@"%@%@%@", col1, col2, col3]

evaluates to

@"a"

as desired. However this is also true if col1 and col2 are equal to @"" and col3 = @"a" since

[NSString stringWithFormat:@"%@%@%@", col1, col2, col3]

still evaluates to

@"a"

The problem here is not that the component strings are able to be empty but rather that they're able to vary over a range.

If we constrained the problem so that the lengths were exact

  • col1, which is 10 characters long
  • col2, which is 20 characters long
  • col3, which is 30 characters long

it would then be possible to recover str with the following function:

void GetColumnsFromString(NSString *str, NSString * __autoreleasing *col1, NSString * __autoreleasing *col2, NSString * __autoreleasing *col3)
{
    if (col1) {
        *col1 = [str substringWithRange:NSMakeRange(0, 10)];
    }
    if (col2) {
        *col2 = [str substringWithRange:NSMakeRange(10, 20)];
    }
    if (col3) {
        *col3 = [str substringWithRange:NSMakeRange(30, 30)];
    }
}

Another, better, solution, as has been mentioned in the comments, is to use "special" characters in str to demarcate the boundary between the component strings. If we constructed str like this

str = [NSString stringWithFormat:@"%@%@%@", col1, col2, col3];

and we constrained col1 and col2 and col3 not to contain the character , then we could parse col1 and col2 as follows:

NSArray *cols = [str componentsSeparatedByString:@""];
col1 = cols[0];
col2 = cols[1];
col3 = cols[2];

The situation is no different if instead of the character you use the space character.

Edit: You added more information about the input string and the desired output:

Rather than three, there are four component strings: col1, col2, col3, and col4. We have some information about them:

  • col1 is between 0 and 10 characters long
  • col1 does not contain the space character
  • col2 is between 0 and 20 characters long
  • col2 does not contain the space character
  • col3 is between 0 and 30 characters long
  • col3 MAY contain the space character
  • col4 isn't constrained in length
  • col4 MAY contain the space character

Additionally, the four strings are separated by spaces in their concatenation. So your goal is to uniquely determine col1, col2, col3, and col4 so str is equal to

[NSString stringWithFormat:@"%@ %@ %@ %@", col1, col2, col3, col4];

You can use an NSScanner to extract col1 and col2 in this case:

NSScanner *scanner = [NSScanner scannerWithString:str];
NSCharacterSet *spaceCharacterSet = [NSCharacterSet characterSetWithCharactersInString:@" "];
NSString *col1 = nil, *col2 = nil;
[scanner scanUpToCharactersFromSet:spaceCharacterSet intoString:&col1];
[scanner scanUpToCharactersFromSet:spaceCharacterSet intoString:&col2];

At this point, it's possible to extract the string remainder which contains the two final strings col3 and col4 separated by a space:

NSCharacterSet *emptyCharacterSet = [NSCharacterSet characterSetWithCharactersInString:@""];
NSString *remainder = nil;
[scanner scanUpToCharactersFromSet:emptyCharacterSet intoString:&remainder];

At this point, you are back in the same sort of situation I described at the beginning. You have a string (remainder) which consists of two component strings (col3 and col4) which are separated by a space. The only way to detect the border between these two strings is that space.

However, col3 may contain spaces. If it could not, then you could simply scan along until the next space was reached and extract the contents between the beginning and that space into col3 and the rest into col4.

In addition, col4 may also contain spaces. If it could not, then you could scan from the end of remainder until the first space from the end was reached, extract that range into col4 and the rest into col3.

Nate Chandler
  • 4,533
  • 1
  • 23
  • 32
  • Thanks for the detailed answer. I can't insert separators between strings. The document is generated by some third party agency, but I believe they may have an XML version of the file, which may be another option for me, I am sure there are some XML parsers out there. – SMA2012 Nov 07 '12 at 00:42
  • @Nate You stated *The situation is no different if instead of the  character you use the space character.*. This isn't true. If you call `[str componentsSeparatedByString:@" "];` then the 3rd column is split up a bunch because its value can contain spaces. The rest of your answer is great otherwise. – rmaddy Nov 07 '12 at 00:52
  • @maddy I see your point, but that's not the meaning I intended to convey. I meant that the situation was no different if rather than constraining `col1` and `col2` and `col3` not to contain the character ``, we constrained them not to contain the space character. In that case we would then be able to do `[str componentsSeparatedByString:@" "]` and get the different component strings as we wanted to. – Nate Chandler Nov 07 '12 at 00:59
  • @SMA2012 I edited my answer to address the fact that you actually have four strings. As before, unfortunately, the problem can't be solved as stated. The XML is the only way to go in this case. – Nate Chandler Nov 07 '12 at 01:03