1

So, I have the following application...

I have a folder with more than ten thousand folders on it. Each folder is a job and all of them have the same format:

"ten digits" + _ + "the name of the job"

Like this:

"1234567890_Stackoverflow"

How can I find the full name of that folder with just the first 10 digits?

This is my Python code: Where path is my current working directory and job is my ten digits I want to find

for dirs in os.listdir(path):
    if fnmatch.fnmatch(dirs, job+"*"):
        job_name = dirs
        break

job_path=path+'\\'+job_name
print(job_path)

With this code, I can find the full name of the job, but no matter which job I am asking for it always takes 23 seconds to find it.

Is there a faster way to find it?

I mean, if I manually go the job's folders, I can search the folder in 4-5 seconds, but manually.

  • Have you tried the standard Python library function str.startswith, instead of fnmatch? You don't need to do a wildcard pattern match since you know exactly what the first 10 characters are. – Paul Cornelius Jul 04 '20 at 23:30
  • yeap, I have tried that too, but always takes 23 sec – Jorge Rangel Jul 04 '20 at 23:40
  • @PaulCornelius, Do you know if it possible to build a string with my path and the job number, to get a string like this "\\mypath\123456789* and then look for them without need to read and match all the folders inside my path? – Jorge Rangel Jul 04 '20 at 23:48
  • If you're on Windows you could try launching "dir" with an argument like 123456789* as a subprocess (see Jonatan's answer). I don't know if it would be faster. You need to figure out if you are limited by the OS or by the listdir function. If you type "dir 0123456789*" at the command line, how long does it take? If you put a print(dirs) statement as the first statement inside your loop, how long does it take before the first line of output appears? What's weird is that it always takes 23 seconds, whether it's first or last in the listing. – Paul Cornelius Jul 05 '20 at 01:04
  • @PaulCornelius, hey I think that I found the real issue, the reason that the program always takes 23 seconds is that the path is a network drive. I was checking for that and I noticed that the program takes about 22.5 sec to get the `os.listdir()` and then just like 0.5 sec to get me the desired path. – Jorge Rangel Jul 05 '20 at 03:50
  • Good discovery. Unfortunately it seems to mean that you can't write a faster program, but at least you know why :) – Paul Cornelius Jul 06 '20 at 00:12

2 Answers2

1

Here is how you can use glob.glob():

from glob import glob

path = 'C:\\Users\\User\\Desktop\\Folder'
ten_digits = '1234567890'
for file in glob(f"{path}\\{ten_digits}_*"):
    print(file)
Red
  • 26,798
  • 7
  • 36
  • 58
0

Maybe the OS is faster, you can try:

import subprocess
number = 54932034857
dirname = subprocess.run([f"ls -d {number}*"], 
    stdout=subprocess.PIPE,
    shell=True).stdout.decode("utf-8").split("\n")[0]
 print(dirname) 

Probably you can use dir instead of ls if you're on on windows.

Jonatan Öström
  • 2,428
  • 1
  • 16
  • 27