50

Trying to create a function that returns the # of files found a directory and its subdirectories. Just need help getting started

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Bob
  • 539
  • 2
  • 5
  • 11

6 Answers6

105

One - liner

import os
cpt = sum([len(files) for r, d, files in os.walk("G:\CS\PYTHONPROJECTS")])
kiriloff
  • 25,609
  • 37
  • 148
  • 229
  • Could you explain why you need the sum function? Why wouldn't len(files) be sufficient? – G Warner Jun 23 '15 at 14:40
  • 8
    @GWarner There are multiple sets of files (from each subdirectory) that are yielded by os.walk. You must sum over the length of each set to get the amount of files. If you use len(files) then you get a list where each element is the number of files in its associated subdirectory. – Lightyear Buzz Jun 29 '15 at 21:51
  • 1
    note you need to use forward slashes (or \\\) instead of back slashes as you have here, otherwise python thinks you're using escapes. – starwarswii Oct 19 '19 at 01:24
31

Use os.walk. It will do the recursion for you. See http://www.pythonforbeginners.com/code-snippets-source-code/python-os-walk/ for an example.

total = 0
for root, dirs, files in os.walk(folder):
    total += len(files)
Hans Then
  • 10,935
  • 3
  • 32
  • 51
6

Just add an elif statement that takes care of the directories:

def fileCount(folder):
    "count the number of files in a directory"

    count = 0

    for filename in os.listdir(folder):
        path = os.path.join(folder, filename)

        if os.path.isfile(path):
            count += 1
        elif os.path.isfolder(path):
            count += fileCount(path)

    return count
Blender
  • 289,723
  • 53
  • 439
  • 496
2
  • Here are some one-liners using pathlib, which is part of the standard library.
  • Use Path.cwd().rglob('*') or Path('some path').rglob('*'), which creates a generator of all the files.
    • Unpack the generator with list or *, and use len to get the number of files.
  • See How to count total number of files in each subfolder to get the total number of files for each directory.
from pathlib import Path

total_dir_files = len(list(Path.cwd().rglob('*')))

# or 
total_dir_files = len([*Path.cwd().rglob('*')])

# or filter for only files using is_file()
file_count = len([f for f in Path.cwd().rglob('*') if f.is_file()])
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
1

Here is a time-test for the 3 most popular methods:

import os
from datetime import datetime

dir_path = "D:\\Photos"

# os.listdir

def recursive_call(dir_path):
    folder_array = os.listdir(dir_path)
    files = 0
    folders = 0
    for path in folder_array:
        if os.path.isfile(os.path.join(dir_path, path)):
            files += 1
        elif os.path.isdir(os.path.join(dir_path, path)):
            folders += 1
            file_count, folder_count = recursive_call(os.path.join(dir_path, path))
            files += file_count
            folders += folder_count
    return files, folders
start_time = datetime.now()
files, folders = recursive_call(dir_path)
print ("\nFolders: %d, Files: %d" % (folders, files))
print ("Time Taken (os.listdir): %s seconds" % (datetime.now() - start_time).total_seconds())

# os.walk

start_time = datetime.now()
file_array = [len(files) for r, d, files in os.walk(dir_path)]
files = sum(file_array)
folders = len(file_array)
print ("\nFolders: %d, Files: %d" % (folders, files))
print ("Time Taken (os.walk): %s seconds" % (datetime.now() - start_time).total_seconds())

# os.scandir

def recursive_call(dir_path):
    folder_array = os.scandir(dir_path)
    files = 0
    folders = 0
    for path in folder_array:
        if path.is_file():
            files += 1
        elif path.is_dir():
            folders += 1
            file_count, folder_count = recursive_call(path)
            files += file_count
            folders += folder_count
    return files, folders
start_time = datetime.now()
files, folders = recursive_call(dir_path)
print ("\nFolders: %d, Files: %d" % (folders, files))
print ("Time Taken (os.scandir): %s seconds" % (datetime.now() - start_time).total_seconds())

Results:

Folders: 53, Files: 29048
Time Taken (os.listdir): 3.074945 seconds

Folders: 53, Files: 29048
Time Taken (os.walk): 0.062022 seconds

Folders: 53, Files: 29048
Time Taken (os.scandir): 0.048984 seconds

Conclusion:

While os.walk is the most elegant, os.scandir recursively implemented seems to be the fastest.

leenremm
  • 1,083
  • 13
  • 19
0

Here is My Version

def fileCount(folder, allowed_extensions=None):
   "count the number of files in a directory and sub directory"
   count = 0
   for base, dirs, files in os.walk(folder):
      for file in files:
         if allowed_extensions and file.endswith(allowed_extensions) or not allowed_extensions:
            count += 1
   return count

scan_dir = r"C:\Users\sannjayy\Desktop"

allowed_extensions = (".jpg", ".mp4")

print(fileCount(scan_dir , allowed_extensions))

Sanjay Sikdar
  • 435
  • 4
  • 10