3

I want to detect if I saw a file already and would like to identify it with something unique. Under Linux there is the inode number together with the device id (see stat() or fstat()). I assume under Windows I would find something similar.

To start easy, the boost::filesystem offers convenient methods, e.g. I can use boost::filesystem::recursive_directory_iterator to traverse the directory tree. The file_status gives me if it is a regular file, but not the inode number.

The closest thing I found was boost::filesystem::equivalent() taking two paths. I guess this is also the most portable design.

The thing is that I would like to put the inode numbers into a database to have a quick lookup. I cannot do this with this function, I would have to call equivalent() with all paths already existing in the database.

Am I out of luck and boost will not provide me such information due to portability reasons?

(edit) The intention is to detect duplicates via hardlinks during one scan of a folder tree. equivalent() does exactly that, but I would have to do a quadratic algorithm.

Borph
  • 842
  • 1
  • 6
  • 17
  • 1
    For your use-case you should also know that the inode can be re-used if a file is deleted. That's true both of Linux/unix inodes and Windows File ID/MFT records. So if the inode you see today is the same as the inode you saw yesterday, that means nothing. Inodes are only useful if you know the files ***both*** exist ***now***. Also if a file is deleted, and an identical file created with the identical name, the inode will probably be different. That's a common paradigm for "updating" files. So "same inode different time" != "same file" and "different inode different time" != "different file". – Ben May 08 '14 at 19:17
  • You are right. During one scan it isn't an issue, you just get to know the 'other' hardlink (beside race conditions). But you point out right that you cannot rely on them beyond that. – Borph May 08 '14 at 20:04

1 Answers1

5

The Windows CRT implementation of stat always uses zero for the inode, so you will have to roll your own. This is because on Windows FindFirstfile is faster than GetFileInformationByHandle, so stat uses FindFirstFile, which does not include the inode information. If you don't need the inode, that's great, performance win. But if you do, the following will help.

The NTFS equivalent to the INODE is the MFT Record Number, otherwise known as the file ID. It has slightly different properties, but to within a margin of error can be used for the same purposes as the INODE, i.e. identifying whether two paths point to the same file.

You can use GetFileInformationByHandle or GetFileInformationByHandleEx to retrieve this information. You will first have to call CreateFile to obtain the file handle.

  • You need FILE_READ_ATTRIBUTES rights only to get the file ID.
  • You should specify FILE_SHARE_READ|FILE_SHARE_WRITE|FILE_SHARE_DELETE
  • You should specify OPEN_EXISTING as the disposition.

Once you have the handle, use one of the GetFileInformation functions to obtain the file ID, then close the handle.

This information you need is available in the BY_HANDLE_FILE_INFORMATION nFileIndexLow and nFileIndexHigh members or if ReFS is in use, then a 128 bit file ID may be in use. To obtain this you must use the updated function.

Ben
  • 34,935
  • 6
  • 74
  • 113
  • In the strict sense, this is an informative comment, because it doesn't answer the question at all: "How to get the inode with boost::filesystem?" – sehe May 08 '14 at 12:45
  • 1
    @sehe, In reality his question is not "How do I do this with boost" it is "I am using boost to get the inode, but it doesn't work on Windows - how can I make it work on windows. Can I do it using boost if not then how?". – Ben May 08 '14 at 12:47
  • That's valuable information, thanks! sehe is right, it doesn't answer directly the question, but "Can I do it using boost, and if not, then how?" hits the nail. – Borph May 08 '14 at 19:55