11

I have some questions related to copying a folder structure. In fact, I need to do a conversion of pdf files to text files. Hence I have such a folder structure for the place where I import the pdf:

D:/f/subfolder1/subfolder2/a.pdf 

And I would like to create the exact folder structure under "D:/g/subfolder1/subfolder2/" but without the pdf file since I need to put at this place the converted text file. So after the conversion function it gives me

D:/g/subfolder1/subfolder2/a.txt

And also I would like to add if function to make sure that under "D:/g/" the same folder structure does not exist before creating.

Here is my current code. So how can I create the same folder structure without the file?

Thank you!

import converter as c
import os
inputpath = 'D:/f/'
outputpath = 'D:/g/'

for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
      with open("D:/g/"+ ,mode="w") as newfile:
          newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
martineau
  • 119,623
  • 25
  • 170
  • 301
SXC88
  • 227
  • 1
  • 5
  • 16

3 Answers3

22

For me the following works fine:

  • Iterate over existing folders

  • Build the structure for the new folders based on existing ones

  • Check, if the new folder structure does not exist
  • If so, create new folder without files

Code:

import os

inputpath = 'D:/f/'
outputpath = 'D:/g/'

for dirpath, dirnames, filenames in os.walk(inputpath):
    structure = os.path.join(outputpath, dirpath[len(inputpath):])
    if not os.path.isdir(structure):
        os.mkdir(structure)
    else:
        print("Folder does already exits!")

Documentation:

linusg
  • 6,289
  • 4
  • 28
  • 78
  • That is great! ;)) – SXC88 Nov 27 '16 at 14:11
  • 2
    Caveat: The `dirpath[len(inputpath):]` assumes `inputpath` ends with a pathname component separator (like `'/'`), which—while it does match what's shown in the question—generally isn't necessary when specifying directory paths (so often **won't** be there but this code counts on it). – martineau Dec 10 '18 at 09:19
  • 4
    To avoid the possible issue with trailing separators mentioned in my last comment, I would suggest using `os.path.relpath(dirpath, inputpath)` (instead of `dirpath[len(inputpath):]`) which would work in either case. Goes to show, managing paths as if they were mere strings is asking for trouble, so doing it this way avoids doing that and the potential problem. – martineau Dec 10 '18 at 10:35
10

How about using shutil.copytree()?

import shutil
def ig_f(dir, files):
    return [f for f in files if os.path.isfile(os.path.join(dir, f))]

shutil.copytree(inputpath, outputpath, ignore=ig_f)

The directory you want to create should not exist before calling this function. You can add a check for that.

Taken from shutil.copytree without files

kumardeepakr3
  • 395
  • 6
  • 16
  • This is what I understood: You have srcDir which has some pdf files. And you have a dstDir in which you want the .txt converted files. Also you want to preserve the directory structure. And want the destination directory to have the same directory structure as source directory. What am I getting wrong or what additional things do you need? – kumardeepakr3 Nov 27 '16 at 11:58
  • Traceback (most recent call last): File "C:/Users/sxc/Desktop/python file/pdf converter/pdfminer-20140328/b.py", line 12, in shutil.copytree(inputpath, outputpath, ignore=ig_f) File "C:\Python27\lib\shutil.py", line 177, in copytree os.makedirs(dst) File "C:\Python27\lib\os.py", line 157, in makedirs mkdir(name, mode) WindowsError: [Error 183] : 'D:/g/' it throws me this error message. and also I would like to test with if statement if the folder structure already exists under "D:/g/" before creating – SXC88 Nov 27 '16 at 12:02
  • 1
    The folder `D:/g/` must not exist when calling copytree() function. The error is because of that. Try removing that directory before executing the code. – kumardeepakr3 Nov 27 '16 at 12:31
1

A minor tweak to your code for skipping pdf files:

for root, dirs, files in os.walk('.', topdown=False):
    for name in files:
        if name.find(".pdf") >=0: continue
        with open("D:/g/"+ ,mode="w") as newfile:
            newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
VBB
  • 1,305
  • 7
  • 17