I'm trying to make a list of all the files and folders on a mounted NTFS Volume, and I made 2 ways to do it so far, all yielding different results (unfortunately).
(NOTE: I couldn't include additional sources here because link limit)
There are a few things I would like cleared up:
(1) How come certain files/folders have weird unrecognizable characters in the middle of the name? and how do I write print them to wstringstream and then how would I properly write them to a wofstream?
Example file path: C:\Users\Rahul\AppData\Local\Packages\winstore_cw5n1h2txyewy\LocalState\Cache\4\4-https∺∯∯wscont.apps.microsoft.com∯winstore∯6.3.0.1∯100∯US∯en-us∯MS∯482∯features1908650c-22a4-485e-8e88-b12d01c84f2f.json.dat
How it appears if you were to use dir in cmd: C:\Users\Rahul\AppData\Local\Packages\winstore_cw5n1h2txyewy\LocalState\Cache\4\4-https???wscont.apps.microsoft.com?winstore?6.3.0.1?100?US?en-us?MS?482?features1908650c-22a4-485e-8e88-b12d01c84f2f.json.dat
How it appears if you were to use wprintf in C++: C:\Users\Rahul\AppData\Local\Packages\winstore_cw5n1h2txyewy\LocalState\Cache\4\4-https
The file name shows properly in windows explorer, but has trouble being printed in cmd. It appears as a box in notepad++, but if you right-click, it shows it properly, so notepad++ can also display the characters properly (sort-of, encoding change maybe?).
I'm currently using (ss is the stringstream, initialized as wstingstream ss("");)
wstringstream ss("");
(my program methods here)
wofstream out("...", wofstream::out);
out << ss.rdbuf();
out.close();
I'm assuming that the encoding has at least something to do with it, but at the same time, I'm not sure which flags to use.
(2) Are all files listed in the MFT? Every link on NTFS says that all file information and attributes are stored in the MFT, but according to the open source NTFSLib (have a link limit, can be found by googling An-NTFS-Parser-Lib), there are 131840 file records.
When I run my own program, I end up with this 50MB file (includes permissions and the such). My program uses FSCTL_MFT_ENUM_USN_DATA and CreateFile for handles and GetFileInformationByHandle for getting extended information. CreateFile takes in the WCHAR* normally, and doesn't have the weird null termination issues (I think, maybe, not even sure anymore, this might be where the missing files are).
It shows that there are 129454 files that it could read, I'm assuming that the other 131840-129454=2386 files are files that were deleted but are still in the USN journal.
(3) How come my Java version of the code outputs more file records than the MFT even contains?
The output of my Java code is a 150MB file (includes permissions, enumerates with names instead of symbols because I don't know how to not do that, so it's way bigger).
As you can see here, there are 161430 file records in this one. That's more than what NTFSLib said there are. Yes, it is the case that probably many of those 131840 file records are 'additional names', but I explicitly avoided symlinks in my Java version. Is it the case that those extra 30000 files are generated from hardlinks or somehow having more names is independent from being symlinks?