-2

Using the Windows API, I'm trying to write a program to read data from a disk. I managed to get access to the content of the drive using CreateFile and I'm able to search through it. Let's say there are some files on that disk and I know their paths, but I'm actually interested in their physical location.

My question is: Is it possible to retrieve the physical location or address of the files (or sector they're located in) and where are they stored on the drive without searching the whole drive? If so, what functions should I use? Using SetFilePointer or FindFirstFile don't seem to solve the solution either.

  • 1
    what you mean under raw-disk ? files exist inside filesystem. and you can manipulate files only through filesystem. file can be in multiple sectors or located not from sector begin. can be (or not) compressed, encrypted. you need or send request to filesystem or yourself implement filesystem functionality (parse all it structures for found some file). – RbMm Nov 28 '18 at 14:41
  • Not sure if it is possible, but low level file system calls are to be found in DeviceIoControl and one of its file system codes. – gast128 Nov 28 '18 at 15:11
  • Yes, it is indeed possible, this is what file system does. If you want to bypass it, you are free to do so. – SergeyA Nov 28 '18 at 15:12
  • 2
    What *problem* are you trying to solve? This reads much like an [XY Problem](http://xyproblem.info). – IInspectable Nov 28 '18 at 15:19
  • To watch the data of the file and for example see what happens to the data when it gets deleted or modified. – Michał Recław Nov 28 '18 at 15:22
  • 3
    What for? Physical organization of data is an implementation detail of the file system implementation. If you care about notifications of file changes instead, see [Obtaining Directory Change Notifications](https://learn.microsoft.com/en-us/windows/desktop/fileio/obtaining-directory-change-notifications). – IInspectable Nov 28 '18 at 16:15

1 Answers1

0

The whole point of any file system is to abstract the physical disk sectors and provide you a higher level abstraction (called files). So the answer to "Is it possible to retrieve the physical location" should be no! (in general); some code might even move the sectors of a file (e.g. a disk defragmenter and you could imagine it is running concurrently with your program, even if that is not recommended..)

For more, read wikipages on file systems and files, then read a good book such as Operating systems: Three Easy Pieces

Notice that by using files, you are expecting that your program behave similarly after having moved a file system into a different disk, provided the file paths, contents, and metadata remain the same. In particular, you could have two external USB disks enclosures with different geometries or capacities having the same file contents (perhaps even in different file systems, e.g. VFAT on one and NTFS on another), and you then expect your program to behave identically when accessing such files (in the first box or the second one). Whatever box is plugged, your program would (for example) access the same F:\MyDir\MyFile.dat file. As file systems, both boxes would appear identical. At the physical sector level, data would be organized very differently.

BTW, the physical organization of files inside a file system varies greatly from one file system to another one. You could use some Ext3 file system on your machine (since there are Ext3 drivers for Windows) - and that is actually useful to share some data between Linux & Windows on a dual boot PC -, and the file organization is different from a FAT one or a NTFS one.

You might get some way to query the kernel to get the actual physical sector location. But I am not sure it works for all file systems (what would be the meaning of a sector location for some remote NFS one). And that information could be stale before your program get it (e.g. if some defragmenter is working in parallel). Also, other processes could access and modify the same file system at the same time (so that meta data -e.g. the sector location- would be obsolete by the time your process is scheduled to run again).

On Windows and on Unix like systems, file system code runs in the kernel. And other processes could use that same code (and the same file system) while your process is not running. Both Windows and Unix have preemptive scheduling, so you have no guarantee that your process runs again in user mode before some other process is using the same file system.

Remember that in practice, your file data often stays in the page cache. And that is why you might not hear your disk working -if you still have a rotating hard disk- when accessing the same file several times in a row (e.g. running the same program on the same file twice, a few seconds apart; usually the second run is keeping the disk silent, because the file data is already in RAM).

In a comment you mention that you want

To watch the data of the file and for example see what happens to the data when it gets deleted or modified.

but that should work at the file system level. Linux has inotify(7) facilities for that (they work on most local file systems, e.g. Ext4 or BTRFS, but not on remote file systems à la nfs(5), and neither on pseudo file systems à la proc(5)). I don't know if Windows has something similar to Linux inotify (but probably yes, at least in some cases).

You probably should consider using some database (maybe as simple as sqlite), and perhaps you want ACID properties (then use some real RDBMS like PostGreSQL). With PostGreSQL you might use TRIGGERs to be aware that some data changed, even if some other program changes the same database.

You could also do some file locking, and adopt the convention that every program accessing your particular file should lock it appropriately.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • 1
    The question is to broad to answer in detail. A short answer to the question is 'Yes, it is possible'. – SergeyA Nov 28 '18 at 15:12
  • Only for a *particular* file system (probably NTFS or FAT). Perhaps not for some Ext3 one on an external disk. – Basile Starynkevitch Nov 28 '18 at 15:13
  • Why not? You can still read the disk as a device, and parse ext3 metadata. If the file can be read through normal API, it can by definition be read directly from the disk. – SergeyA Nov 28 '18 at 15:14
  • Because the kernel code would modify that data (e.g. because it write new files for *other* processes) while you are reading it. – Basile Starynkevitch Nov 28 '18 at 15:16
  • Obviously you'd have to issue appropriate locking while doing so. – SergeyA Nov 28 '18 at 15:17
  • That was not in the question, and is not easily doable in general. Also, locking work at file level, not at physical sector level. Some file system code might move sectors of a file while it is accessed – Basile Starynkevitch Nov 28 '18 at 15:19
  • 2
    The short answer is "Yes, it is sometimes possible to varying degrees", but Basile's answer is far superior to that. – Lightness Races in Orbit Nov 28 '18 at 15:21
  • 1
    Obviously, if you are implementing file system (this is what question essentially boils down to) you have to behave like a file system. In particular, you need to lock the device while you are working with it on the kernel level. I do not know how to do this in Windows, but I have no doubt it is possible, because this is what file system drivers do. – SergeyA Nov 28 '18 at 15:23
  • I don't know Windows neither. But I happen to know Linux (and using it since 1993). A rule of thumb is that using low level utilities which might move sectors (like `debugfs` or `fsck` ...) on a write-mounted filesystem is a bad idea. – Basile Starynkevitch Nov 28 '18 at 15:24
  • On Linux (which I happen to be familiar with as well) you'd write a kernel driver for this task. I am also not familiar with referring to `debugfs` as a `low-level utility` akin to fsck, but what do I know. – SergeyA Nov 28 '18 at 15:30
  • But that kernel driver might be probably be file system specific. Would you write the same code for Ext4 and for FAT? And on Linux, to watch changing data, [inotify(7)](http://man7.org/linux/man-pages/man7/inotify.7.html) could be simpler (and not requiring kernel code) – Basile Starynkevitch Nov 28 '18 at 15:31
  • @LightnessRacesinOrbit: you praise my answer but it stays downvoted! – Basile Starynkevitch Nov 28 '18 at 15:38