3

I am currently in the position where I need to rename all files in a directory. The chance that a file does not change name is minimal, and the chance that an old filename is the same as a new filename is considerable, making renaming conflicts likely.

Thus, simply looping over the files and renaming old->new is not an option.

The easy / obvious solution is to rename everything to have a temporary filename: old->tempX->new. Of course, to some degree, this shifts the issue, because now there is the responsibility of checking nothing in the old names list overlaps with the temporary names list, and nothing in the temporary names list overlaps with the new list.

Additionally, since I'm dealing with slow media and virus scanners that love to slow things down, I would like to minimize the actual actions on disk. Besides that, the user will be impatiently waiting to do more stuff. So if at all possible, I would like to process all files on disk in a single pass (by smartly re-ordering rename operations) and avoid exponential time shenanigans.

This last bit has brought me to a 'good enough' solution where I first create a single temporary directory inside my directory, I move-rename everything into that, and finally, I move everything back into the old folder and delete the temporary directory. This gives me a complexity of O(2n) for disk and actions.

If possible, I'd love to get the on-disk complexity to O(n), even if it comes at a cost of increasing the in-memory actions to O(99999n). Memory is a lot faster after all.

I am personally not at-home enough in graph theory, and I suspect the entire 'rename conflict' thing has been tackled before, so I was hoping someone could point me towards an algorithm that meets my needs. (And yes, I can try to brew my own, but I am not smart enough to write an efficient algorithm, and I probably would leave in a logic bug that rears its ugly head rarely enough to slip through my testing. xD)

Stigma
  • 1,686
  • 13
  • 27
  • Is it OK to read the entire file structre to memory, decide on new names in memory, then write them all? – Yossi Vainshtein May 04 '17 at 07:25
  • I already have a list of old and new names in memory, so that is not a problem. Memory is not a limitation I am concerned about... within reason anyway. – Stigma May 04 '17 at 08:08

3 Answers3

1

One approach is as follows.

Suppose file A renames to B and B is a new name, we can simply rename A.

Suppose file A renames to B and B renames to C and C is a new name, we can follow the list in reverse and rename B to C, then A to B.

In general this will work providing there is not a loop. Simply make a list of all the dependencies and then rename in reverse order.

If there is a loop we have something like this:

A renames to B
B renames to C
C renames to D
D renames to A

In this case we need a single temporary file per loop.

Rename the first in the loop, A to ATMP. Then our list of modifications becomes:

ATMP renames to B
B renames to C
C renames to D
D renames to A

This list no longer has a loop so we can process the files in reverse order as before.

The total number of file moves with this approach will be n + number of loops in your rearrangement.

Example code

So in Python this might look like this:

D={1:2,2:3,3:4,4:1,5:6,6:7,10:11}  # Map from start name to final name

def rename(start,dest):
    moved.add(start)
    print 'Rename {} to {}'.format(start,dest)

moved = set()
filenames = set(D.keys())
tmp = 'tmp file'
for start in D.keys():
    if start in moved:
        continue
    A = [] # List of files to rename
    p = start
    while True:
        A.append(p)
        dest = D[p]
        if dest not in filenames:
            break
        if dest==start:
            # Found a loop
            D[tmp] = D[start]
            rename(start,tmp)
            A[0] = tmp
            break
        p = dest
    for f in A[::-1]:
        rename(f,D[f])

This code prints:

Rename 1 to tmp file
Rename 4 to 1
Rename 3 to 4
Rename 2 to 3
Rename tmp file to 2
Rename 6 to 7
Rename 5 to 6
Rename 10 to 11
Peter de Rivaz
  • 33,126
  • 4
  • 46
  • 75
  • 1
    I went with your answer because you covered all the different sides of the issue in depth: chains as well as the possibility of cycles are demonstrated and shown. This allowed me to wrap my head around the problem most clearly whilst implementing a solution fitting my needs in my own project while adding a simple transaction log with rollback capability in case of errors. Thank you for taking the time to write such an indepth answer with example code. :-) – Stigma May 06 '17 at 00:40
1

Looks like you're looking at a sub-problem of Topologic sort. However it's simpler, since each file can depend on just one other file. Assuming that there are no loops:

Supposing map is the mapping from old names to new names:

In a loop, just select any file to rename, and send it to a function which :

  1. if it's destination new name is not conflicting (a file with the new name doesn't exist), then just rename it
  2. else (conflict exists)

    2.1 rename the conflicting file first, by sending it to the same function recursively

    2.2 rename this file

A sort-of Java pseudo code would look like this:

// map is the map, map[oldName] = newName;
HashSet<String> oldNames = new HashSet<String>(map.keys());    
while (oldNames.size() > 0)
{
   String file = oldNames.first(); // Just selects any filename from the set;
   renameFile(map, oldNames, file);
}
...
void renameFile (map, oldNames, file)
{
    if (oldNames.contains(map[file])
    {
       (map, oldNames, map[file]);
    }
    OS.rename(file, map[file]); //actual renaming of file on disk
    map.remove(file);
    oldNames.remove(file);
}
Yossi Vainshtein
  • 3,845
  • 4
  • 23
  • 39
  • I think you're right but need small improvement to detect loop. Your code will stackoverflow (not a pun, really) when try to rename 1.txt->2.txt and 2.txt->1.txt – Byzod Aug 27 '21 at 00:20
1

I believe you are interested in a Graph Theory modeling of the problem so here is my take on this:

You can build the bidirectional mapping of old file names to new file names as a first stage.

Now, you compute the intersection set I the old filenames and new filenames. Each target "new filename" appearing in this set requires the "old filename" to be renamed first. This is a dependency relationship that you can model in a graph.

Now, to build that graph, we iterate over that I set. For each element e of I:

  • Insert a vertex in the graph representing the file e needing to be renamed if it doesn't exist yet
  • Get the "old filename" o that has to be renamed into e
  • Insert a vertex representing o into the graph if it doesn't already exist
  • Insert a directed edge (e, o) in the graph. This edge means "e must be renamed before o". If that edge introduce a cycle (*), do not insert it and mark o as a file that needs to be moved-and-renamed.

You now have to iterate over the roots of your graph (vertices that have no in-edges) and perform a BFS using them as a starting point and perform the renaming each time you discover a vertex. The renaming can be a common rename or a move-and-rename depending on if the vertex was tagged.

The last step is to move back the moved-and-renamed files back from their sandbox directory to the target directory.

C++ Live Demo to illustrate the graph processing.

Rerito
  • 5,886
  • 21
  • 47
  • 1
    This is a very good explanation. Since I was a bit more focused on implementing a working solution which another answer helped more with, I can't select you as 'The Answer'.. but I want to compliment you on a great post either way. Thank you. – Stigma May 06 '17 at 00:43