20

I have several files across several folders like this:

dir
├── 0
│   ├── 103425.xml
│   ├── 105340.xml
│   ├── 109454.xml
│
│── 1247
│   └── doc.xml
├── 14568
│   └── doc.xml
├── 1659
│   └── doc.xml
├── 10450
│   └── doc.xml
├── 10351
│   └── doc.xml

How can I extract all the documents into a single folder appending the folder name for each moved document:

new_dir
├── 0_103425.xml
├── 0_105340.xml
├── 0_109454.xml
├── 1247_doc.xml
├── 14568_doc.xml
├── 1659_doc.xml
├── 10450_doc.xml
├── 10351_doc.xml

I tried to extract them with:

import os

for path, subdirs, files in os.walk('../dir/'):
    for name in files:
        print(os.path.join(path, name))

UPDATE

Also, I tried to:

import os, shutil
from glob import glob

files = []
start_dir = os.getcwd()
pattern   = "*.xml"

for dir,_,_ in os.walk('../dir/'):
    files.extend(glob(os.path.join(dir,pattern))) 
for f in files:
    print(f)
    shutil.move(f, '../dir/')

The above gave me the path of each file. However, I do not understand how to rename and move them:

---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
<ipython-input-50-229e4256f1f3> in <module>()
     10 for f in files:
     11     print(f)
---> 12     shutil.move(f, '../dir/')

/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/shutil.py in move(src, dst, copy_function)
    540         real_dst = os.path.join(dst, _basename(src))
    541         if os.path.exists(real_dst):
--> 542             raise Error("Destination path '%s' already exists" % real_dst)
    543     try:
    544         os.rename(src, real_dst)

Error: Destination path '../data/230948.xml' already exists

The above error shows why I would like to rename it with its folder.

tumbleweed
  • 4,624
  • 12
  • 50
  • 81

2 Answers2

10

How does this work for you?

import os
import pathlib

OLD_DIR = 'files'
NEW_DIR = 'new_dir'

p = pathlib.Path(OLD_DIR)
for f in p.glob('**/*.xml'):
    new_name = '{}_{}'.format(f.parent.name, f.name)
    f.rename(os.path.join(NEW_DIR, new_name))

If you don't have a modern version of Python (3.5+) you can also just use glob, os, and shutil:

import os
import glob
import shutil


for f in glob.glob('files/**/*.xml'):
    new_name = '{}_{}'.format(os.path.basename(os.path.dirname(f)), os.path.basename(f))
    shutil.move(f, os.path.join('new_dir', new_name))
Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
8

This is easiest to do with Python 3's new pathlib module for path operations, and then shutil.move for moving the files into their correct places. Unlike os.rename, shutil.move will work like the mv command and behave correctly even for cross-filesystem moves.

This code will work for paths nested to any level - any / or \ in the paths will be replaced with _ in the target filename, so dir/foo/bar/baz/xyzzy.xml will be moved to new_dir/foo_bar_baz_xyzzy.xml.

from pathlib import Path
from shutil import move

src = Path('dir')
dst = Path('new_dir')

# create the target directory if it doesn't exist
if not dst.is_dir():
    dst.mkdir()

# go through each file
for i in src.glob('**/*'):
    # skip directories and alike
    if not i.is_file():
        continue

    # calculate path relative to `src`,
    # this will make dir/foo/bar into foo/bar
    p = i.relative_to(src)

    # replace path separators with underscore, so foo/bar becomes foo_bar
    target_file_name = str(p).replace('/', '_').replace('\\', '_')

    # then do rename/move. shutil.move will always do the right thing
    # note that it *doesn't* accept Path objects in Python 3.5, so we
    # use str(...) here. `dst` is a path object, and `target_file_name
    # is the name of the file to be placed there; we can use the / operator
    # instead of os.path.join.
    move(str(i), str(dst / target_file_name))