I am trying to loop over some compressed files (extension '.gz') and I am running into a problem. I want to perform a specific action when the FIRST file ending in 'aa' is encountered - it can be a random one, it doesn't necessarily have to be the first one on the list. Only then, Python has to search if there are OTHER "aa" files in the folder, if so the 2nd rule has to be applied. (There may be from 1 to many "aa" files). Finally, the 3rd rule has to be applied to all other files not ending with "aa".
However, when I run the code below, not all the files get processed.
What am I doing wrong?
Thanks!
inputPath = "write your path"
fileExt = r".gz"
flag = False
for item in os.listdir(inputPath): # loop through items in dir
if item.endswith(fileExt): # check for ".gz" extension
full_path = os.path.join(inputPath, item) # get full path of files
if item.endswith('aa' + fileExt) and flag == False:
df = pd.read_csv(full_path, compression='gzip', header=0, sep='|', encoding="ISO-8859-1") #from gzip to pandas df
# do something
flag = True
print('1 rule:', "The item processed is ", item)
elif item.endswith('aa' + fileExt) and flag == True:
df = pd.read_csv(full_path, compression='gzip', header=0, sep='|', encoding="ISO-8859-1") #from gzip to pandas df
# do something else
print('2 rule:', "The item processed is ", item)
elif not (item.endswith('aa' + fileExt)) and flag == True:
df = pd.read_csv(full_path, compression='gzip', header=0, sep='|', encoding="ISO-8859-1") #from gzip to pandas df
# do something else
print('3 rule:', "The item processed is ", item)
I believe this is due to the fact that Python iterates over the list of files sorted in alphabetical order, then it the other files are ignored. How can I fix this issue?
LIST OF FILES:
File_202112311aa.gz
File_20211231ab.gz
File_20211231.gz
File_20211231aa.gz
OUTPUT
1 rule The item processed is File_202112311aa.gz
3 rule The item processed is File_20211231ab.gz
2 rule The item processed is File_20211231aa.gz