13

I'm currently working with some code involving saving a file to a user-defined file. If the user passes in a filename with no extension, the code autodetects the extension based on the file type (stored internally).

However, I'm having a hard time determining whether the filename passed to the code has an extension or not. I'm using Path.HasExtension(filename) and Path.GetExtension(filename) but it seems to be exhibiting strange behavior:

File.EXT => .EXT is the extension. This is fine.

This Is A File.EXT => .EXT is the extension. This is also fine.

This Is A File. Not An Extension => . Not An Extension is the extension. However, I would think of this as a file without an extension. Windows thinks so too when I create a file with this name (creating a file with an unrecognized extension causes windows to call it a EXTENSIONNAME File, whereas files without an extension such as this one are just called File).

This Is A File.Not An Extension => .Not An Extension is the extension. Same problem as above.

Also note that this same behavior is evident in Path.GetFileNameWithoutExtension(filename) (e.g. it reports the filename without extension on the last two examples to be just This Is A File).

So what I'm taking from this is that .NET and Windows differ on what they think of as an extension.


The Question: I'm wondering if it's OK for me to implement code such as this:

if(!Path.HasExtension(filename) || Path.GetExtension(filename).Contains(" ")) {...}

since that would pull my code's definition of a proper extension more in line with how Windows treats things. Or is there something I'm missing here which explicitly says I must allow spaces in my extensions?

I've searched and found this slightly similar question, but the documents linked therein only specify that it's not recommended to end the extension with a space/period -- they say nothing about spaces within the extension.

Jim Balter
  • 16,163
  • 3
  • 43
  • 66
NickAldwin
  • 11,584
  • 12
  • 52
  • 67
  • I am able to create a file `This Is A File. Not An Extension` using both Windows 7 and Windows XP. I `right click on the desktop, New, Text Document` to create the file. Or are you talking about creating the file using .Net? – Thomas Li Mar 27 '11 at 01:18
  • @Thomas I can create those files either with .NET or with Windows 7. My point is that Windows doesn't recognize the ". Not An Extension" as the file's extension, whereas .NET does. – NickAldwin Mar 27 '11 at 01:22
  • None of your claims of what "Windows thinks" are backed by any evidence or examples.The .NET functions follow standard practice--the characters following the last `.` in the last segment of a path is the extension. Doing something different if spaces occur after the `.` is extra work for no good reason. Also, for your use case, there's no need to make the distinction--if the filename already has the extension appropriate to the content type then do nothing, else append the appropriate extension ... that's better than *replacing* the existing extension, which might not actually be one. – Jim Balter Jun 16 '20 at 01:54
  • As for what's "ok" ... whatever you do is ok as long as you *document it*, so that your users know how their input will be handled. – Jim Balter Jun 16 '20 at 01:56

3 Answers3

14

The extension on a filename in Windows is purely a convention. The GetExtension and HasExtension methods only look for a dot in the filename and act accordingly. You are free to put spaces anywhere you like within the filename (including the extension).

When you say "Windows thinks so too", it's really just some code in Explorer that tries to parse out extensions, and it simply uses a slightly different algorithm than .NET.

Gabe
  • 84,912
  • 12
  • 139
  • 238
  • OK, thanks for the answer. From a human standpoint, do you consider it reasonable to assume the user probably didn't mean it to be an extension if it has a space in it? – NickAldwin Mar 27 '11 at 01:24
  • You can easily imagine that nobody would create a program that uses extensions with spaces in them, so any space in an extension must mean it's not an extension. – Gabe Mar 27 '11 at 01:55
6

How the filesystem handles names and how the Windows shell (i.e. Explorer) handles file names are two completely different beasts.

The filesystem doesn't care about spaces, dots or anything else -- to it, the filename is just one opaque string (with some restrictions on allowed characters). The name/extension separation is just a made-up convention. The shell, on the other hand, is free to make up its own interpretation of what an extension is because its purpose is not to store and retrieve file information but rather to provide the user with a better experience. So don't go looking there for answers.

I would suggest going with what the System.IO methods return (because following the convention is good), but you can do whatever you like in your code if there's a good reason for it.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • "The filesystem doesn't care about spaces, dots" -- this actually isn't true ... create some files ending with dot or space (you can do this with cygwin, e.g.) and see how the system handles them. (Better do this on an FS that you don't care about.) And this is a human engineering question. Windows Explorer is how most users interact with the system, and its quirks need to be accounted for. – Jim Balter Jun 16 '20 at 02:14
  • @JimBalter IMO the quirks of Windows Explorer should only be accounted for in answers to Stack Overflow questions tagged accordingly (which this one is not, and furthermore the code it contains is clearly a C#/.NET affair). – Jon Jun 16 '20 at 17:18
  • 1
    Um, the question is about users of Windows, and what "Windows thinks", which upon careful reading of the question is actually about what Windows Explorer displays as the file type. The OP wanted to know about the reasonableness of writing .NET code that acts the same way. And I didn't comment on what should be "accounted for in answers", just what the facts are, so I find your response to be an irrelevant strawman. I won't respond further. – Jim Balter Jun 16 '20 at 23:21
5

There is no official definition of what an extension is. The common convention is that everything after the final . is the extension.

However if you would grab a HUGE list of all common-used extensions I think you'll only find a handful of examples where spaces in an extension are used.

I would say, disallow spaces in extensions. 999/1000 times the user didn't mean it as an extension.

To quote Wikipedia on filenames:

. (DOT): allowed but the last occurrence will be interpreted to be the extension separator in VMS, MS-DOS and Windows. In other OSes, usually considered as part of the filename, and more than one full stop may be allowed.

orlp
  • 112,504
  • 36
  • 218
  • 315