0

I need to process a large number of files. Processing must be able to be stopped and then restarted, in a different session. I need to restart processing from the point where it was interrupted.

The solution I was thinking of is the following:

  1. Save the path of the last processed file.
  2. Start from root, without processing, until I get to the saved file.
  3. Process the remaining files.

For this to work, the order in which the files are returned must be the same in both sessions. So, assuming the tree is not modified, is the order the same in both sessions? The file system is NTFS.

zdf
  • 4,382
  • 3
  • 18
  • 29
  • 4
    [The documentation](https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findnextfilea) has this to say: "The order in which this function returns the file names is dependent on the file system type. With the NTFS file system and CDFS file systems, the names are usually returned in alphabetical order. With FAT file systems, the names are usually returned in the order the files were written to the disk, which may or may not be in alphabetical order. However, as stated previously, these behaviors are not guaranteed." – Igor Tandetnik Oct 02 '22 at 14:06
  • @IgorTandetnik I've already read it. It doesn't say if the order (alphabetical or not) is the same no matter how many times you query the tree (assuming the tree is not modified). – zdf Oct 02 '22 at 14:12
  • 6
    My reading of the documentation is that chances are high that the order would be the same in practice (assuming no files were added, removed or renamed in the interim), but it's not guaranteed (possibly to give the implementation leeway for future changes). You decide how comfortable you are relying on this emergent behavior. – Igor Tandetnik Oct 02 '22 at 14:14
  • Why make an assumption that filesystem is going to be the same ? Just sort the list of files before you process them. – erik258 Oct 02 '22 at 14:15
  • @erik258 I'm trying to avoid this. I do not know how many files will be processed ("x000'000") and I have to keep the memory low. – zdf Oct 02 '22 at 14:19
  • then make the order irrelevant and keep track of the files you processed in an append-only state file somewhere. or keep the queue of flies to process somewhere persistent (redis. comes to mind as it's light and simple) – erik258 Oct 02 '22 at 14:22
  • @erik258 I have already considered it. It means that I will have to query the file for each processed file - not promising. – zdf Oct 02 '22 at 14:28
  • 5
    You read it, we all read it, the answer to "is the order the same in both sessions" is not guaranteed, so *no*. – Simon Mourier Oct 02 '22 at 15:16
  • order of course not depend from "session" and based only on the internal directory structure - how information about files is stored – RbMm Oct 02 '22 at 17:26
  • There is no guarantee but you can use as it is in general. – YangXiaoPo-MSFT Oct 04 '22 at 05:22

0 Answers0