4

I'm creating a class to store a filename. To do so, I need to know exactly which characters are invalid and exactly which characters are invalid as leading/trailing characters.

Windows Explorer trims leading and trailing white-space characters automatically when naming a file, so I need to trim the same characters when constructing a filename instance.

I thought about using string.Trim(), but it would be naive to assume the default set of characters it trims coincides exactly with the invalid leading/trailing filename characters of the OS.

Documentation for string.Trim() says that it trims the following characters by default: U+0009, U+000A, U+000B, U+000C, U+000D, U+0020, U+0085, U+00A0, U+1680, U+2000, U+2001, U+2002, U+2003, U+2004, U+2005, U+2006, U+2007, U+2008, U+2009, U+200A, U+200B, U+2028, U+2029, U+3000, U+FEFF

Unfortunately, some of the above characters are NOT invalid in a file, because they aren't in the character set returned by System.IO.Path.GetInvalidFileNameChars.

Am I then correct that string.Trim() could potentially remove VALID leading/trailing characters from a filename, therefore corrupting the filename?

What exactly are the invalid leading/trailing characters for a filename in the Windows Vista OS? I understand that they are not necessarily the same as the file system itself, since the OS can run on different file systems.

Lucero
  • 59,176
  • 9
  • 122
  • 152
Triynko
  • 18,766
  • 21
  • 107
  • 173
  • If you try and add one of the invalid characters in windows it will create a hover box indicating what is not allowed. – ojblass Apr 13 '09 at 04:04

3 Answers3

3

Filenames can start/end in spaces. Trim will eliminate them.

File names cannot contain

/ \ : * ? " | < >
ojblass
  • 21,146
  • 22
  • 83
  • 132
  • This is exactly the answer I've seen a million times, and it upsets me. If valid filenames start and end with spaces, why does Explorer strip them off? That would make it very difficult to edit the names of such files. Documentation FAIL: http://msdn.microsoft.com/en-us/library/aa365247.asp – Triynko Apr 13 '09 at 04:13
  • 1
    I think the designers of explorer overstepped their bounds on this. Sorry if I neglected the why but I try and answer questions and not go off onto tangents. "Why does explorer strip off spaces?" is a perfectly good question to ask. – ojblass Apr 13 '09 at 04:18
  • I was just saying that those aren't a complete list of invalid characters. Path.GetInvalidFileNameChars returns more than those posted here. Explorer's rules aren't consistent with the filesystem (and probably neither with the OS). With such poor documentation, other programs are destined to fail – Triynko Apr 13 '09 at 04:24
  • Explorer strips spaces because users probably don't intend to have spaces at the beginning or end of their filenames (it's likely to lead to confusion). This isn't an OS or filesystem limitation, though. – kvb Apr 13 '09 at 04:34
  • But the fact that the API returns more than the UI shows you is absolute crap. – ojblass Apr 13 '09 at 04:39
  • And judging by the behaviors described in the next answer, it's not clear whether Explorer actually strips the spaces or just appears to. If a file is created with leading spaces, explorer won't show them, but doesn't strip them either. If I try to use explorer to add spaces, then they are removed – Triynko Apr 13 '09 at 04:41
  • 1
    I wish Explorer was consistent with the OS and allowed actual valid filenames and editing thereof. If it doesn't like a filename, it should say "Are you sure you want to pad that baby with leading spaces?", instead of just assuming we're all stupid, and failing miserably on edge cases. – Triynko Apr 13 '09 at 04:46
  • Interesting thing, max filename length is 255 characters in drive root. In subfolders, explorer preemptively determines the max file name when renaming by subtracting the lengths of the folder names and the backslashes required between them, so the path to your file will not exceed max path length. – Triynko Apr 13 '09 at 05:53
  • 1
    You are fighting a hoard of poor decisions by a large team of developers and Microsoft. – ojblass Apr 19 '09 at 02:50
3

Am I then correct that string.Trim() could potentially remove VALID leading/trailing characters from a filename, therefore corrupting the filename?

Yes. Even more so on a UNIX-like system, where ' X' is a valid filename and distinct from ' x '

Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
  • The only safe thing to do then is ignore Explorer's behavior, and treat all filenames as valid except those that are explicitly restricted (COM1,LPT1,etc.) or contain characters in Path.InvalidFileNameChars. I can't find solid documentation for what the OS actually enforces, but lame hints abound:( – Triynko Apr 13 '09 at 04:32
  • pretty much. Most GUI-based things have more, and different, filename conventions than the underlying file system. OS/X can be a little maddening that way too. – Charlie Martin Apr 13 '09 at 14:35
1

This code runs and creates the file:

Imports System.IO
Module Module1

Sub Main()
    Dim fs As New FileStream("d:\temp\   file . foo ", FileMode.Create, _
       FileAccess.Write)
    'declaring a FileStream and creating a word document file named file with
    'access mode of writing
    Dim s As New StreamWriter(fs)
    'creating a new StreamWriter and passing the filestream object fs as argument
    s.BaseStream.Seek(0, SeekOrigin.End)
    'the seek method is used to move the cursor to next position to avoid text to be
    'overwritten
    s.WriteLine("This is an example of using file handling concepts in VB .NET.")
    s.WriteLine("This concept is interesting.")
    'writing text to the newly created file
    s.Close()
End Sub

End Module

NOTE: the actual name of the file created with the above code appear to be " file . foo". If I edit the filename in Explorer the space isn't there but when I rerun the code above, it replaces the file.

NOTE: I took the code from http://www.startvbdotnet.com/files/default.aspx and added the spaces

NOTE: I notice that Vista's Explorer rename won't let you add the spaces before or after filename, so you can make "foo . txt" but not " foo.txt " using that method.

jrcs3
  • 2,790
  • 1
  • 19
  • 35
  • I suspected such a call would succeed. I wonder how that name shows up in Explorer, and what would happen if one tried to edit it. These edge cases are just not documented; it's no wonder strange behaviors arise at such edge cases, it's never clear to the programmer to begin with. – Triynko Apr 13 '09 at 04:19
  • It shows up as " file . foo". If I try to rename it to "file . foo" it tells me that source and destination file names can't be the same. – jrcs3 Apr 13 '09 at 04:28
  • It's very odd. This is the kind of behavior that creeps into software when the documentation sucks, and all I can find is answers like "/\:*?"|<> are invalid". – Triynko Apr 13 '09 at 04:36