2

I am currently using the following P/Invoke signature to get the short filename of a regular Windows file:

[DllImport("kernel32.dll", CharSet = CharSet.Auto)]
public static extern int GetShortPathName([MarshalAs(UnmanagedType.LPTStr)] string path,
                                          [MarshalAs(UnmanagedType.LPTStr)] StringBuilder shortPath,
                                          int shortPathLength);

Currently - it is working without any problems, but I noticed something rather peculiar:
I know that Windows uses the following short filename convention:

Cut the name to 6 characters (without extension)
Append the tilde (~)
Append an unsigned integer number which indicates the match index (starting with 1)
Append the original file extension

Thus, the file name C:\abcdefghijklmn.txt should be accessible under the short name C:\abcdefg~1.txt. (Which is working perfectly fine.)

Now the strange part: I recently performed a small search inside my music directory for specific audio files. This was the result:

.\Rammstein & Tatu - Moscow.mp3
.\Rammstein - Asche zu Asche.mp3
.\Rammstein - Der Meister.mp3
.\Rammstein - Du Hast.mp3
.\Rammstein - Eifersucht.mp3
.\Rammstein - Feuer Frei.mp3
.\Rammstein - Führe Mich.mp3
.\Rammstein - Haifisch.mp3
...

And the same search in short notation:

.\RA8E17~1.MP3
.\RA23A6~1.MP3
.\RAMMST~1.MP3
.\RA0CAE~1.MP3
.\RAMMST~2.MP3
.\RAMMST~3.MP3
.\RAMMST~4.MP3
.\RA6BAA~1.MP3
...

My question is: Why is windows generating such "random" prefixes before the tilde (like RA23A6 or RA0CAE)?

user2864740
  • 60,010
  • 15
  • 145
  • 220
unknown6656
  • 2,765
  • 2
  • 36
  • 52
  • 8
    `I know (...) short filename convention:` you don't know that, you assume those facts are the way it works. You should not. I have never found a conclusive source on how exactly short filenames are created. [This is the official source by Microsoft](https://support.microsoft.com/en-us/kb/142982) and it still does not list all cases i have seen. – RedX Jul 20 '15 at 21:01
  • I would have thought that was obvious. You can't have more than one file in the folder with the same short name. – Jonathan Potter Jul 20 '15 at 21:05
  • 1
    After scanning for RAMMST~1,2,3, and 4, Windows decide to use random part in prefix instead of scanning forward. In short - it is done for fast speed. – i486 Aug 24 '15 at 09:50

2 Answers2

12

Microsoft does not document this, but Wikipedia does:

8.3 filename:

Although there is no compulsory algorithm for creating the 8.3 name from an LFN, Windows uses the following convention:

1.If the LFN is 8.3 uppercase, no LFN will be stored on disk at all.

  • Example: TEXTFILE.TXT

2.If the LFN is 8.3 mixed case, the LFN will store the mixed-case name, while the 8.3 name will be an uppercased version of it.

  • Example: TextFile.Txt becomes TEXTFILE.TXT.

3.If the filename contains characters not allowed in an 8.3 name (including space which was disallowed by convention though not by the APIs) or either part is too long, the name is stripped of invalid characters such as spaces and extra periods. Other characters such as + are changed to the underscore _, and uppercased. The stripped name is then truncated to the first 6 letters of its basename, followed by a tilde, followed by a single digit, followed by a period ., followed by the first 3 characters of the extension.

  • Example: TextFile1.Mine.txt becomes TEXTFI~1.TXT (or TEXTFI~2.TXT, should TEXTFI~1.TXT already exist). ver +1.2.text becomes VER_12~1.TEX.

4.Beginning with Windows 2000, if at least 4 files or folders already exist with the same initial 6 characters in their short names, the stripped LFN is instead truncated to the first 2 letters of the basename (or 1 if the basename has only 1 letter), followed by 4 hexadecimal digits derived from an undocumented hash of the filename, followed by a tilde, followed by a single digit, followed by a period ., followed by the first 3 characters of the extension.

  • Example: TextFile.Mine.txt becomes TE021F~1.TXT.

As Joey mentioned, the undocumented hash of the filename has been reverse engineered.

Community
  • 1
  • 1
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Possibly worth pointing out that if this is undocumented then it is considered implementation detail and has the potential to change, so you shouldn't rely on it. – icabod Jul 21 '15 at 11:28
3

That's because the very primitive scheme of using a counter and a prefix only works up to a certain number of files. With increasing numbers of files Windows switches to a shorter prefix and a hash. Someone actually reverse-engineered the hash along with a bit of explanation:

In case you aren’t aware of how 8.3 file names work, here’s a quick run-down.

  • All periods other than the one separating the filename from the extension are dropped - a.testing.file.bat turns into atestingfile.bat.
  • Certain special characters like + are turned into underscores, and others are dropped. The file name is upper-cased. 1+2+3 Hello World.exe turns into 1_2_3HELLOWORLD.EXE.
  • The file extension is truncated to 3 characters, and (if longer than 8 characters) the file name is truncated to 6 characters followed by ~1. SomeStuff.aspx turns into SOMEST~1.ASP.
  • If these would cause a collision, ~2 is used instead, followed by ~3 and ~4.
  • Instead of going to ~5, the file name is truncated down to 2 characters, with the replaced replaced by a hexadecimal checksum of the long filename - SomeStuff.aspx turns into SOBC84~1.ASP, where BC84 is the result of the (previously-)undocumented checksum function.
Joey
  • 344,408
  • 85
  • 689
  • 683