0

I have one million text files (contents are about 1kB) and repeatedly get a handle on a random one (FindFirstFile) inside a C++ program in Windows 7. I believe NTFS' b+ tree method quickly does this but, since I know that these files are not changing, I would like this procedure done even faster with RAM (i.e., with no hard drive reads after some pre-loading).

I could pre-load all one million files (ReadFile, or similarly with CreateFileMapping), but I don't know how to get the b+ tree into RAM. Is there a simple solution or API for this where I don't need to build a new tree from scratch? Is there an even faster method?

I also don't want to build a directory tree; I just want to leave all the files in one folder and leverage NTFS' tree method if possible; I'm kind of looking for a "CreateDirectoryMapping" function. (Also, is there any chance that Windows might be automatically caching the b+ tree in RAM after I first use it? I'm guessing no, which is why I want to deliberately cache it for this question.)

animuson
  • 53,861
  • 28
  • 137
  • 147
bobuhito
  • 285
  • 2
  • 20
  • 1
    It is in the disk cache the first time you use it. – Raymond Chen Jun 30 '12 at 20:57
  • 1
    `std::map` where the key is the filename? – ildjarn Jun 30 '12 at 21:27
  • Thanks for the information; is there a way to tell Windows that I need to keep it in cache? If not, I can't rely on this...also, I imagine that re-directing to the cache would be slower than my directly using RAM. – bobuhito Jun 30 '12 at 21:33
  • In thinking more, I'm not sure cache is well-defined...in my comment, I was thinking about Windows caching the file pages in my extra RAM, but Raymond might have been referring to the "L3 cache" which should be faster (but I doubt it's big enough; the b+ tree alone should be around 4MB). Anyway, can anybody answer without using the cache? – bobuhito Jul 01 '12 at 01:10
  • 1
    Again, what's wrong with memory-mapping the files and using `std::map`? – ildjarn Jul 02 '12 at 02:24
  • I don't understand that but it's the same as using CreateFileMapping, right? I thought that would just map the individual files (1kB), not the directory (4MB b-tree!). The problem is that even if I open a million files in RAM with the mapping, I need to quickly go from a text string (indicating file name) to the specific handle. This usually requires a search tree but I don't want to build it. – bobuhito Jul 06 '12 at 00:00
  • `std::map<>` _is_ the search tree, that's the entire point. – ildjarn Jul 06 '12 at 19:42
  • Ok, I think I understand (but, as you can probably tell, I've never worked with the std::map<> container before). So, your solution basically rebuilds a new search tree from scratch, but my entire point was to avoid rebuilding it and to instead copy NTFS' tree. I'm concluding that the answer to my question is no, but that std::map<> would be the easiest way to rebuild a tree. – bobuhito Jul 07 '12 at 04:15
  • Copying NTFS's tree is, in fact, what rebuilding the tree accomplishes. Of course, you could instead use a hash table, which would be O(1). – Puppy Aug 19 '12 at 21:02
  • Ugh. You have about 1Gb of data. NTFS will use some space for each file name in the directory plus 1K for each file record plus 4K for the data itself. When doing random opens/reads, the btree will be relatively small and get cached. However, the 1K file records will likely exhaust the kernels cache and over the long haul, you'll be reading these in. Ditto for the data. If you want maximal performance, read everything in and use a hashtable to map between name and data. Depending on physical memory and other system activity, you may have some paging activity. – MJZ Nov 28 '12 at 21:18

0 Answers0