Finding path names with non-consecutive numbers using glob in Python

Question

Is there a way to find file names with numbers that are not consecutive? More specifically, I'm looking to list filenames with these numbers included:

path +'*.s201701*.nc'
path +'*.s201801*.nc'
path +'*.s201901*.nc'
path +'*.s201702*.nc'
path +'*.s201802*.nc'
path +'*.s201902*.nc'
path +'*.s201712*.nc'
path +'*.s201812*.nc'
path +'*.s201912*.nc'

I can get the changes in '2017' to '2019' since the numbers are consecutive, but not the '01', '02', '12', because these aren't. This doesn't work:

glob.glob(path +'*.s201[7-9][01,02,12]*.nc'

And this works,

glob.glob(path +'*.s201[7-9][0-1][1-2]*.nc'

but also gives me files in s201*11*.nc, which I don't want. Any tips?

score 1 · Accepted Answer · answered Apr 10 '19 at 04:32

1

You can't do this with a single glob - the language just isn't sophisticated enough - but you can do it with two:

glob.glob(path +'*.s201[7-9]0[1-2]*.nc') + glob.glob(path +'*.s201[7-9]12*.nc')

answered Apr 10 '19 at 04:32

Nathan Vērzemnieks

5,495
1
11
23

score 0 · Answer 2 · answered Apr 09 '19 at 20:12

You could just check for repeat numbers using regex on the results form os.listdir. I made a sample file in the same directory as the script and it has repeat numbers. Using the first method returns an empty list. Removing the 'not' in the list comprehension returns the offending file name.

import os
import re

files = [f for f in os.listdir(path) if not re.search(r'(\d)\1+\b', f)]

print(files)
[]

Removing the 'not' to find repeat numbers:

files = [f for f in os.listdir(path) if re.search(r'(\d)\1+\b', f)]
print(files)
['s201911.txt']

Finding path names with non-consecutive numbers using glob in Python

2 Answers2