Trying to create a function that returns the # of files found a directory and its subdirectories. Just need help getting started
Asked
Active
Viewed 4.9k times
6 Answers
105
One - liner
import os
cpt = sum([len(files) for r, d, files in os.walk("G:\CS\PYTHONPROJECTS")])

kiriloff
- 25,609
- 37
- 148
- 229
-
Could you explain why you need the sum function? Why wouldn't len(files) be sufficient? – G Warner Jun 23 '15 at 14:40
-
8@GWarner There are multiple sets of files (from each subdirectory) that are yielded by os.walk. You must sum over the length of each set to get the amount of files. If you use len(files) then you get a list where each element is the number of files in its associated subdirectory. – Lightyear Buzz Jun 29 '15 at 21:51
-
1note you need to use forward slashes (or \\\) instead of back slashes as you have here, otherwise python thinks you're using escapes. – starwarswii Oct 19 '19 at 01:24
31
Use os.walk
. It will do the recursion for you. See http://www.pythonforbeginners.com/code-snippets-source-code/python-os-walk/ for an example.
total = 0
for root, dirs, files in os.walk(folder):
total += len(files)

Hans Then
- 10,935
- 3
- 32
- 51
6
Just add an elif
statement that takes care of the directories:
def fileCount(folder):
"count the number of files in a directory"
count = 0
for filename in os.listdir(folder):
path = os.path.join(folder, filename)
if os.path.isfile(path):
count += 1
elif os.path.isfolder(path):
count += fileCount(path)
return count

Blender
- 289,723
- 53
- 439
- 496
2
- Here are some one-liners using pathlib, which is part of the standard library.
- Use
Path.cwd().rglob('*')
orPath('some path').rglob('*')
, which creates a generator of all the files.- Unpack the generator with
list
or*
, and uselen
to get the number of files.
- Unpack the generator with
- See How to count total number of files in each subfolder to get the total number of files for each directory.
from pathlib import Path
total_dir_files = len(list(Path.cwd().rglob('*')))
# or
total_dir_files = len([*Path.cwd().rglob('*')])
# or filter for only files using is_file()
file_count = len([f for f in Path.cwd().rglob('*') if f.is_file()])

Trenton McKinney
- 56,955
- 33
- 144
- 158
1
Here is a time-test for the 3 most popular methods:
import os
from datetime import datetime
dir_path = "D:\\Photos"
# os.listdir
def recursive_call(dir_path):
folder_array = os.listdir(dir_path)
files = 0
folders = 0
for path in folder_array:
if os.path.isfile(os.path.join(dir_path, path)):
files += 1
elif os.path.isdir(os.path.join(dir_path, path)):
folders += 1
file_count, folder_count = recursive_call(os.path.join(dir_path, path))
files += file_count
folders += folder_count
return files, folders
start_time = datetime.now()
files, folders = recursive_call(dir_path)
print ("\nFolders: %d, Files: %d" % (folders, files))
print ("Time Taken (os.listdir): %s seconds" % (datetime.now() - start_time).total_seconds())
# os.walk
start_time = datetime.now()
file_array = [len(files) for r, d, files in os.walk(dir_path)]
files = sum(file_array)
folders = len(file_array)
print ("\nFolders: %d, Files: %d" % (folders, files))
print ("Time Taken (os.walk): %s seconds" % (datetime.now() - start_time).total_seconds())
# os.scandir
def recursive_call(dir_path):
folder_array = os.scandir(dir_path)
files = 0
folders = 0
for path in folder_array:
if path.is_file():
files += 1
elif path.is_dir():
folders += 1
file_count, folder_count = recursive_call(path)
files += file_count
folders += folder_count
return files, folders
start_time = datetime.now()
files, folders = recursive_call(dir_path)
print ("\nFolders: %d, Files: %d" % (folders, files))
print ("Time Taken (os.scandir): %s seconds" % (datetime.now() - start_time).total_seconds())
Results:
Folders: 53, Files: 29048
Time Taken (os.listdir): 3.074945 seconds
Folders: 53, Files: 29048
Time Taken (os.walk): 0.062022 seconds
Folders: 53, Files: 29048
Time Taken (os.scandir): 0.048984 seconds
Conclusion:
While os.walk
is the most elegant, os.scandir
recursively implemented seems to be the fastest.

leenremm
- 1,083
- 13
- 19
0
Here is My Version
def fileCount(folder, allowed_extensions=None):
"count the number of files in a directory and sub directory"
count = 0
for base, dirs, files in os.walk(folder):
for file in files:
if allowed_extensions and file.endswith(allowed_extensions) or not allowed_extensions:
count += 1
return count
scan_dir = r"C:\Users\sannjayy\Desktop"
allowed_extensions = (".jpg", ".mp4")
print(fileCount(scan_dir , allowed_extensions))

Sanjay Sikdar
- 435
- 4
- 10