0

I'm using the NTFS MasterFileTable / USN journal of a few disk/partitions (C:, D:, E:, F:, etc), and I'd like to use a unique ID for each file/directory.

While I'm reading the USN_RECORD (also called PUSN_RECORD), there is this int64:

DWORDLONG     FileReferenceNumber;

that is a unique file/directory identifier, unique at least in the current partition.

But there could be collisions:

  • a file in C: could have FileReferenceNumber 1932847
  • another file in D: could have FileReferenceNumber 1932847 too!

I'd like to avoid having to use such a big thing as an int128 (that would be the 64 bits of FileReferenceNumber + 5 bits for the drive letter C:, D:, E:, ..., Z:).

I also would like to avoid having to use a pair (char DriveLetter, DWORDLONG FileReferenceNumber) to identify a file in the computer.

How to use a 64-bit int to code FileReferenceNumber + drive letter?

Is it possible because FileReferenceNumber has a few free unused bits?

If not, how would you handle this?

Basj
  • 41,386
  • 99
  • 383
  • 673
  • Note that using a drive letter is not necessarily sufficient; you can have multiple volumes mounted under the same drive letter. – Harry Johnston Jul 11 '17 at 23:01
  • @HarryJohnston How? Can you give an example? – Basj Jul 11 '17 at 23:05
  • https://technet.microsoft.com/en-us/library/cc753321(v=ws.11).aspx – Harry Johnston Jul 11 '17 at 23:05
  • what would be a concrete example? – Basj Jul 11 '17 at 23:06
  • I don't know what you mean by "concrete" in this context. But I look after some computers that have three partitions on the hard disk; one is the hidden "system" partition, the second is the C drive, and the third is mounted at `c:\some\path\mount`. As Anders says, if you are reading the MFT from a volume then of course you will only get entries from that volume. So in your case I guess the only potential problem is that if you limit yourself to volumes with drive letters you might not be including all of the volumes that are present. – Harry Johnston Jul 11 '17 at 23:10

1 Answers1

1

You must use a pair of FileReferenceNumber/FileID and "volume something". You can mount a volume in a folder so you cannot really use the drive letter.

Ideally "volume something" is the volume GUID path but you can use the volume serial number if size is important. Note: Not all volumes have a GUID.

For NTFS you can get it from GetFileInformationByHandle and build a 32-bit+64-bit pair. For ReFS you need GetFileInformationByHandleEx and build a 64-bit+128-bit pair.

Anders
  • 97,548
  • 12
  • 110
  • 164
  • Just to be sure: let's say I'm enumerating MasterFileTable (`FSCTL_ENUM_USN_DATA`) of `\\.\C:`. I will only have FileReferenceNumber of the files in this volume/partition, and not of C:\MOUNTEDFOLDER\ (if this is another volume mounted as a folder in C), right? – Basj Jul 11 '17 at 14:32
  • Up to now, I was using a map `std::map` where the keys were `FileReferenceNumber`s, i.e. int64. So this was very efficient. If now I have to use *pairs* (volume, fileID) as map keys, it will be much slower, don't you think so @Anders? Using integers as map keys is certainly faster. – Basj Jul 11 '17 at 14:35
  • Using the volume serial number is reasonably safe but only a GUID is guaranteed to be unique AFAIK. I don't know the specifics of FSCTL_ENUM_USN_DATA but if it operates on the MFT then it should not return entries from mounted volumes, only the folder where the volume is mounted. (Should be easy for you to test either way). – Anders Jul 11 '17 at 16:27
  • 1
    If performance is an issue, you could have one map per volume rather than a single map that uses pairs as keys. – Harry Johnston Jul 11 '17 at 23:06
  • @HarryJohnston that's what I was also thinking, but then when searching, it adds another loop on the volumes, that's annoying in the beautifulness of code ;) – Basj Jul 12 '17 at 08:45
  • At least for directories only, aren't there a few bits always unused in the `FileReferenceNumber` DWORDLONG 64 bits? – Basj Jul 12 '17 at 08:48
  • @Basj, I really don't think you can count on that. – Harry Johnston Jul 12 '17 at 21:32