0

I have this code to get all files from a folder :

- (NSMutableArray*) allFilesAtPath:(NSString *)startPath
{
    NSMutableArray* listing = [NSMutableArray array];
    NSArray* fileNames = [self contentsOfDirectoryAtPath:startPath error:nil];
    if (!fileNames) return listing;

    for (NSString* file in fileNames) {
        NSString* absPath = [startPath stringByAppendingPathComponent:file];

        BOOL isDir = NO;
        if ([self fileExistsAtPath:absPath isDirectory:&isDir]) {

            [listing addObject:absPath];
            if (isDir) [listing addObjectsFromArray:[self allFilesAtPath:absPath]];
        }
    }
    return listing;
}

In one test folder, I have a file that is named yahoéo.jpg
When NSLogged, it is displayed as yahoe\U0301o.jpg

Of course, that works fine for any other file without such an accentuated character in the file name.

So, when I try to delete it from the array with :

[theFilesArray removeObject:fileName];

fileName is yahoéo.jpg
it is not remove because it is not found into the array.

Why do I have such a character replacement. I do not find anything in the doc talking about that. Which characters are supposed to have the same treatment ? How should I knew that ?

And most of all, how may I do to get the é character in the files name array ?

EDIT

fileName variable used in the removeObject method is constructed by getting a string from a PList file, and giving it to the following method :

+ (NSString*) fileNameWithString:(NSString*)str
{
    NSString* fileName = str;

    NSCharacterSet* charactersToRemove = [NSCharacterSet characterSetWithCharactersInString:@".:/\\"];
    fileName = [[fileName componentsSeparatedByCharactersInSet:charactersToRemove] componentsJoinedByString:@"#"];

    fileName = [fileName stringByAppendingString:@".jpg"];

    return fileName;
}
Oliver
  • 23,072
  • 33
  • 138
  • 230
  • 1
    The NSLog output of an NSArray shows all non-ASCII characters in `\Unnnn` escaped form. But that is only the way NSLog prints it, the actual string contains `é`, so that should not be the problem. - But from where do you get the `fileName` in `[theFilesArray removeObject:fileName]`? It could be problem of "precomposed" vs "decomposed" characters. – Martin R Jul 26 '13 at 22:00
  • 1
    You will not see a difference between a precomposed and decomposed string when it is displayed (but the length should be different). Try normalizing as suggested below. – Martin R Jul 26 '13 at 22:34

1 Answers1

2

The NSLog output of an NSArray shows all non-ASCII characters in \Unnnn escaped form. But that is only the way NSLog prints it, so that should not be the problem.

I assume that is a problem of "precomposed" vs "decomposed" characters. The HFS filesystem uses decomposed characters in the filenames, so é is stored as two Unicode characters:

U+0065 + U+0301  = "e" + COMBINING ACUTE ACCENT

(and NSLog prints that as e\U0301).

This is different from the single Unicode character (precomposed form)

U+00E9 = "é" 

therefore, the string yahoéo.jpg will not be found in the array if its characters are stored in the precomposed form.

If that is really the problem, you can solve it by normalizing all file names to either precomposed or decomposed form, using the precomposedStringWithCanonicalMapping or decomposedStringWithCanonicalMapping method of NSString.

Remarks:

  • Both precomposed and decomposed version of a string will be displayed in the same way (e.g. é).
  • The compare: method of NSString considers both versions of the string as equal (unless you call it with the NSLiteralSearch option).
  • The isEqual: method of NSString considers the two versions of the string as different, and that is used by removeObject: to find the object to remove.
Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • In NSLog, the file name from the array is displayed with the special characters, but the other one is well displayed with the accent. It is built using a cascade of methods, but the point is that it is correct. I update the question to show from where I get it. The realm problem comes from what is stored into the array after allFilesAtPath method. But the path is too long and I can't see the end of the string in the debugger, I must use NSLog – Oliver Jul 26 '13 at 22:34
  • Why should I normalize anything into the allFilesAtPath method ? Where in the doc would I found such a recommendation ? If files have accents, so why wouldn't they be captured with this accent ? – Oliver Jul 26 '13 at 22:36
  • @Oliver: There are different ways to represent "combined characters" such as é (and that is documented e.g. with NSString). It is also documented somewhere that the HFS filesystem uses the decomposed form. So if your plist file used the precomposed form then the strings would be different. - Perhaps you can try if it helps, and then we can argue if and where this is documented. – Martin R Jul 26 '13 at 22:43
  • @Oliver: You could NSLog the string length to check if this is a precomposed vs decomposed problem. In precomposed form, é is one character, in decomposed form it is two characters. - I can't find the Apple Technote at the moment, but see http://en.wikipedia.org/wiki/HFS_Plus: *"... which means that precomposed characters like å are decomposed in the HFS+ filename and therefore count as two characters ..."* – Martin R Jul 26 '13 at 22:49
  • Try 1 : Whatever I use return [fileName decomposedStringWithCanonicalMapping]; or return [fileName preecomposedStringWithCanonicalMapping];, the é character is always in filename – Oliver Jul 26 '13 at 23:01
  • @Oliver: Both forms are displayed as é. Try logging the lengths of the strings. - It is late now in Germany, will return tomorrow! – Martin R Jul 26 '13 at 23:04
  • Try 2 : same thing when applying decompose or precompose to the strings extracted by the allFilesAtPath method : the é never appears – Oliver Jul 26 '13 at 23:06
  • The one in filePathes is 219 long, fileName is 218 long. No leading or trailing spaces. Both show é in NSLog. Visually, they are strictly the same into NSLog. I didn't do anything special, except NSLoging them individually as strings. I guess that when NSLog displays strings from a NSArray, it converts the é in a decomposed form. But well, the problem is still there. – Oliver Jul 26 '13 at 23:50
  • OK, the problem was that one was precomposed and the other decomposed. Forcing a decomposition on the selfmade one made them match. The problem is, not talking about the current problem : how detect that you are using a decomposed string if you missed that in the doc and you have nothing to compare with... ? And why, as it's just the same string, do the compare do not take that into account ? Do you know ? – Oliver Jul 27 '13 at 00:52
  • 1
    @Oliver: The NSString "compare:" method actually considers both versions of the string as equal, but NSArray's "removeObject:" uses "isEqual:" to find the object in the array, and that considers the two versions as different. – Martin R Jul 27 '13 at 14:37