13

I want to copy multiple directories with identical structure (subdirectories have the same names) but different contents into a third location and merge them. At the same time, i want to ignore certain file extensions and not copy them.


I found that the first task alone can be easily handled by copy_tree() function from the distutils.dir_util library. The issue here is that the copy_tree() cannot ignore files; it simply copies everything..

distutils.dir_util.copy_tree() - example

dirs_to_copy = [r'J:\Data\Folder_A', r'J:\Data\Folder_B']
destination_dir = r'J:\Data\DestinationFolder'
for files in dirs_to_copy:
    distutils.dir_util.copy_tree(files, destination_dir)
    # succeeds in merging sub-directories but copies everything.
    # Due to time constrains, this is not an option.

For the second task (copying with the option of excluding files) there is the copytree() function from the shutil library this time. The problem with that now is that it cannot merge folders since the destination directory must not exist..

shutil.copytree() - example

dirs_to_copy = [r'J:\Data\Folder_A', r'J:\Data\Folder_B']
destination_dir = r'J:\Data\DestinationFolder'
for files in dirs_to_copy:
    shutil.copytree(files, destination_dir, ignore=shutil.ignore_patterns("*.abc"))
    # successfully ignores files with "abc" extensions but fails 
    # at the second iteration since "Destination" folder exists..

Is there something that provides the best of both worlds or do i have to code this myself?

daniel f.
  • 1,421
  • 1
  • 13
  • 24
Ma0
  • 15,057
  • 4
  • 35
  • 65
  • How about tweaking the [shutil copytree example](https://docs.python.org/2/library/shutil.html#copytree-example) so that it ignores duplicate directories by catching the error from `makedirs`? – Peter Brittain Aug 12 '16 at 23:37
  • @PeterBrittain But if i do that, it will skip the folders entirely right? And, I do need their contents. – Ma0 Aug 16 '16 at 09:06
  • No it won't. It will try to make all the folders, hit an exception (which you swallow with a new try...except hander around that one line) and then move on to the directory walk (copying files or recursing as needed). – Peter Brittain Aug 16 '16 at 10:26
  • Thanks for the hint (distutils.dir_util>copy_tree()). I have the same issue as your first one I am using copy_tree(). The problem is some files have the same name. Do you know how can copy and rename file names like os operation (name(1) or name_1) – Mohammad Javad Dec 28 '19 at 20:54

3 Answers3

6

As PeterBrittain suggested, writing my own version of shutil.copytree() was the way to go. Below is the code. Note that the only difference is the wrapping of the os.makedirs() in an if block.

from shutil import copy2, copystat, Error, ignore_patterns
import os


def copytree_multi(src, dst, symlinks=False, ignore=None):
    names = os.listdir(src)
    if ignore is not None:
        ignored_names = ignore(src, names)
    else:
        ignored_names = set()

    # -------- E D I T --------
    # os.path.isdir(dst)
    if not os.path.isdir(dst):
        os.makedirs(dst)
    # -------- E D I T --------

    errors = []
    for name in names:
        if name in ignored_names:
            continue
        srcname = os.path.join(src, name)
        dstname = os.path.join(dst, name)
        try:
            if symlinks and os.path.islink(srcname):
                linkto = os.readlink(srcname)
                os.symlink(linkto, dstname)
            elif os.path.isdir(srcname):
                copytree_multi(srcname, dstname, symlinks, ignore)
            else:
                copy2(srcname, dstname)
        except (IOError, os.error) as why:
            errors.append((srcname, dstname, str(why)))
        except Error as err:
            errors.extend(err.args[0])
    try:
        copystat(src, dst)
    except WindowsError:
        pass
    except OSError as why:
        errors.extend((src, dst, str(why)))
    if errors:
        raise Error(errors)
Ma0
  • 15,057
  • 4
  • 35
  • 65
1

For those finding this now:

shutil.copytree() now has a dirs_exist_ok argument as of Python 3.8. Together with the ignore_patterns argument, this can now accomplish merging two directories to a third location in one line:

from shutil import copytree, ignore_patterns

for source in dirs_to_merge:
    copytree(source, destination, dirs_exist_ok=True, ignore=ignore_patterns('*.pyc', '*.txt'))

For example, to exclude files that end in .pyc or .txt (tweaked from docs).

Mason3k
  • 151
  • 1
  • 8
0

if you do want to use shutil directly, here's a hot patch for os.makedirs to skip the error.

import os
os_makedirs = os.makedirs
def safe_makedirs(name, mode=0777):
    if not os.path.exists(name):
        os_makedirs(name, mode)
os.makedirs = safe_makedirs

import shutil

dirs_to_copy = [r'J:\Data\Folder_A', r'J:\Data\Folder_B']
destination_dir = r'J:\Data\DestinationFolder'
if os.path.exists(destination_dir):
    shutil.rmtree(destination_dir)
for files in dirs_to_copy:
    shutil.copytree(files, destination_dir, ignore=shutil.ignore_patterns("*.abc")) code here
  • But upon executing `shutil.rmtree(destination_dir)` i am going to be removing its contents too which will be coming from a directory that will not be copied again. At the end i am going to end up with a copy of the last directory on the `dirs_to_copy` list and everything else would have been copied and then deleted. – Ma0 Aug 16 '16 at 09:10