0

This is for python 2.

I have a chunk of code that is creating an object (dtry) containing three identical lists. Each list is all of the files (excluding folders) with a folder. This works, but I want to extend it to also work for subfolders.

My working code is as follows:

import os

fldr = "C:\Users\jonsnow\OneDrive\Documents\my_python\Testing\Testing"
dtry[:] = []  # clear list

for i in range(3):
        dtry.append([tup for tup in os.listdir(fldr)
                     if os.path.isfile(os.path.join(fldr, tup))])

This successfully creates the three lists containing the names but not full paths of files (and only files, not folders) inside fldr.

I want this to also search within the subfolders of fldr.

Unfortunately I can't figure out how to get it to do so.

I have cobbled together another piece of code that does list all of the files in the subfolders as well (and so kind of works), but it lists the full paths not just the file names. This is as follows:


import os

fldr = "C:\Users\jonsnow\OneDrive\Documents\my_python\Testing\Testing"
dtry[:] = []  # clear list

for i in range(3):
        dtry.append([os.path.join(root, name)
                     for root, dirs, files in os.walk(fldr)
                     for name in files
                     if os.path.isfile(os.path.join(root, name))])

I have tried changing the line:

dtry.append([os.path.join(root, name)

to

tup for tup in os.listdir(fldr)

but this is not working for me.

Can anyone tell me what I am missing here?

Again, I am trying to get dtry to be three lists, each list being all of the files within fldr and the files within all of its all of its subfolders.

askingaq
  • 75
  • 6
  • 2
    Have you considered using `os.walk()`? That gives you a tuple containing dirpath, dirname and filename. That will simplify your iteration and help you focus on the part you want. – perennial_noob Apr 02 '19 at 23:39
  • os.walk is a good choice, as @askingaq says. This is a problem that is also often tacked with recursion. Your function process only one directory, but then calls itself when it comes across subdirectories. – CryptoFool Apr 02 '19 at 23:43
  • I don't understand the 1,2,3 loop. You don't use 'i', in your loop. What is that supposed to do? You just want three identical results? - if so, better to do all the work once and then just copy the result two times. – CryptoFool Apr 02 '19 at 23:45
  • Yep, I just want three identical results. I am doing different stuff to each sub-list later on. I am happy to take your advice and do that at a later stage. You mention using os.walk, as I am in my second code block that produces lists for full file paths. Where else should I be using os.walk? Could you kindly be a bit more explicit as there is something I am missing here. – askingaq Apr 02 '19 at 23:50

2 Answers2

1

Here's the simplest way I can think of to get all of the filenames without any subpaths, using just os.listdir():

import os
from pprint import pprint

def getAllFiles(dir, result = None):
    if result is None:
        result = []
    for entry in os.listdir(dir):
        entrypath = os.path.join(dir, entry)
        if os.path.isdir(entrypath):
            getAllFiles(entrypath ,result)
        else:
            result.append(entry)
    return result

def main():
    result = getAllFiles("/tmp/foo")
    pprint(result)

main()

This uses the recursion idea I mentioned in my comment.

With test directory structure:

/tmp/foo
├── D
│   ├── G
│   │   ├── h
│   │   └── i
│   ├── e
│   └── f
├── a
├── b
└── c

I get:

['a', 'c', 'i', 'h', 'f', 'e', 'b']

If I change this line:

result.append(entry)

to:

result.append(entrypath)

then I get:

['/tmp/foo/a',
 '/tmp/foo/c',
 '/tmp/foo/D/G/i',
 '/tmp/foo/D/G/h',
 '/tmp/foo/D/f',
 '/tmp/foo/D/e',
 '/tmp/foo/b']

To get the exact result you wanted, you can do

dtry = [getAllFiles("/tmp/foo")]
dtry.append(list(dtry[0]))
dtry.append(list(dtry[0]))

And if you want to use os.walk, which is more compact, here are the two flavors of that:

def getAllFiles2(dir):
    result = []
    for root, dirs, files in os.walk(dir):
        result.extend(files)
    return result

def getAllFilePaths2(dir):
    result = []
    for root, dirs, files in os.walk(dir):
        result.extend([os.path.join(root, f) for f in files])
    return result

These produce the same results (order aside) as the recursive versions.

CryptoFool
  • 21,719
  • 5
  • 26
  • 44
  • Thanks, I ended up using the os.walk method. Can you explain how the os.listdir method works? I think I understand most of it, but what does the 'result = None' argument do? – askingaq Apr 03 '19 at 21:47
  • Sure, no problem. With the recursive approach, you're going to pass a single results list to each level of recursion. Files just keep getting added to this list at each level. You could have the initial caller have to pass in an empty list to get the process started. All the result=None does is say that the parameter that takes the results list is optional. This lets the caller not have to bother to pass an empty list. If no list is passed in that param, the code creates the initial empty list. When the routine calls itself, it always passes in a list parameter. – CryptoFool Apr 03 '19 at 21:58
0

You're making an easy problem very hard. This works:

from glob import glob

files = glob(r'C:\Users\jonsnow\OneDrive\Documents\my_python\Testing\Testing\**\*', recursive=True')
result = [files for _ in range(3)]

Note that this produces a list with three references to the original list. If you need three identical copies:

from glob import glob

files = glob(r'C:\Users\jonsnow\OneDrive\Documents\my_python\Testing\Testing\**\*', recursive=True)
result = [files.copy() for _ in range(3)]
Grismar
  • 27,561
  • 4
  • 31
  • 54
  • Note that `glob` is a library that requires no installation, it's part of Python 2 as well as 3. – Grismar Apr 03 '19 at 00:42