Anyone knows why my code will only find the files for the first item in the list and not others? i.e. it will filter files '*a.LOG.bz2' in filename

Question

l='1001'

dts_lst=list(pd.date_range(datetime.strptime('2022-03-15', '%Y-%m-%d'), Dt_now, freq='D').strftime('%Y-%m-%d')) # list of days

p1=map(lambda x, y:Path(drive / x / 'foldera' / y / 'folderb' ), dts_lst, repeat(l))

to_search=['*a.LOG.bz2', '*b.LOG.bz2', '*c.LOG.bz2' ]

for i in to_search:
    f1=map(lambda x, y:Path(x).rglob(y), p2, repeat(i))
    for i2 in f1:
        print(f"this:::{i2}")

What is `p2`? `Dt_now` and `drive` are also undefined, although I can guess what those might be. Also, obviously we can't know if your code should be matching files or not, because we don't know what the directory/file hierarchy you're applying this code to looks like. — CryptoFool, Mar 19 '22 at 02:01
@Gino Mempin that's actually P1 there was a line (P2 = P1), I had trouble correcting the code in stack still new to stack overflow. dt_now is datetime.now(). Anyway by adding list to <`p1=list(map(lambda x, y:Path(drive / x / 'foldera' / y / 'folderb' ), dts_lst, repeat(l)))`> worked — Faheem Khan, Mar 19 '22 at 03:30

CryptoFool · Accepted Answer · 2022-03-19T02:40:41.340

Assuming that p2 is supposed to be p1, or is otherwise a map object similar to p1, then I see your problem. It starts with this line:

p1=map(lambda x, y:Path(drive / x / 'foldera' / y / 'folderb' ), dts_lst, repeat(l))

This creates a map object, which is a kind of iterator. Once you've built one of these, you access it like any other iterator, and it gives you a series of values until it reaches the end of whatever sequence it is meant to provide.

Your problem is that you are iterating over this map object three times. You can't do that. The first time you use the iterator, you exhaust it of values. After the first time, the iterator is effectively at the end of its sequence, and so continuing to query it for values will result in no values being returned.

The simplest way to fix your code is to create a list from the map object that you are assigning to p1:

p1=list(map(lambda x, y:Path(drive / x / 'foldera' / y / 'folderb' ), dts_lst, repeat(l)))

You can iterate over a list multiple times, so this works fine. Another option would be to put the calculation of p1 inside your loop so that you build a new map object each time through your loop:

l='1001'

dts_lst=list(pd.date_range(datetime.strptime('2022-03-15', '%Y-%m-%d'), Dt_now, freq='D').strftime('%Y-%m-%d')) # list of days
    
to_search=['*a.LOG.bz2', '*b.LOG.bz2', '*c.LOG.bz2' ]

for i in to_search:
    p1=map(lambda x, y:Path(drive / x / 'foldera' / y / 'folderb' ), dts_lst, repeat(l))
    p2 = p1  # ????
    f1=map(lambda x, y:Path(x).rglob(y), p2, repeat(i))
    for i2 in f1:
        print(f"this:::{i2}")

An aside...You don't need the repeat() iterators here. You can just let your lambda functions take a single parameter, and then refer directly to the value you were passing to repeat():

p1=map(lambda x:Path(drive / x / 'foldera' / l / 'folderb' ), dts_lst)

f1=map(lambda x:Path(x).rglob(i), p2)

Thanks, listing the map function worked. FYI: I am scanning remote directory for specific files. The directory structure is "/A/date/range(1-100)/files". the files record(names and date only) are pushed to postgres database. upon scanning the directory for new files i push the list i.e `i2` in above code to db. if the filename already exist in the database i.e. on conflict the filename is stored in third column of the record table which I query into python list i.e. list A. then if the files i filtered doesn't exist in list A process it else pass. — Faheem Khan, Mar 19 '22 at 04:01
In short: 1)scan the directory for new files 2)push the files name to datbase which on conflict will store the names in separate column 3)Query the conflict column to python list i.e List A 4)in python process only those files that are not in the list listA — Faheem Khan, Mar 19 '22 at 04:03

Anyone knows why my code will only find the files for the first item in the list and not others? i.e. it will filter files '*a.LOG.bz2' in filename

1 Answers1