0

I have a list like this:

dates = [
    datetime.date(2014, 11, 24),
    datetime.date(2014, 11, 25),
    datetime.date(2014, 11, 26),
    # datetime.date(2014, 11, 27), # This one is missing
    datetime.date(2014, 11, 28),
    datetime.date(2014, 11, 29),
    datetime.date(2014, 11, 30),
    datetime.date(2014, 12, 1)]

I'm trying to find the missing dates between the start and end date, with this expr:

date_set = {dates[0] + timedelta(x) for x in range((dates[-1] - dates[0]).days)}

Strangely enough, it throws an error - it can't access the dates variable. But this expression runs fine:

date_set = {date(2015,2,11) + timedelta(x) for x in range((dates[-1] - dates[0]).days)}

I wrote an expression that does what I wanted:

def find_missing_dates(dates: list[date]) -> list[date]:
    """Find the missing dates in a list of dates (that should already be sorted)."""
    date_set = {(first_date + timedelta(x)) for first_date, x in zip([dates[0]] * len(dates), range((dates[-1] - dates[0]).days))}
    missing = sorted(date_set - set(dates))

    return missing

It's an ugly expression and forced me to fill a second list with the same variable. Does anyone have a cleaner expression?

wjandrea
  • 28,235
  • 9
  • 60
  • 81
v1z3
  • 137
  • 2
  • 9
  • 1
    `set` is not a proper data container for something ordered – Olvin Roght Oct 10 '21 at 17:59
  • 2
    What does "it doesn't work" mean? What are the "start" and "end" dates you're referring to? – Paul M. Oct 10 '21 at 17:59
  • `[d + datetime.timedelta(days=j) for i, d in enumerate(dates[:-1], 1) for j in range(1, (dates[i] - d).days)]` Your `dates` have to be sorted to use this one-liner. I've [answered](https://stackoverflow.com/a/69410981/10824407) [very similar question](https://stackoverflow.com/q/68651783/10824407) week ago. – Olvin Roght Oct 10 '21 at 18:01
  • 1
    datetime.date(20, 12, 26) is year=20, not 2020. – CodeMonkey Oct 10 '21 at 18:01
  • I just ran it and got two empty sets (assuming `import datetime` and `from datetime import timedelta`). You need to provide a [mre]. – wjandrea Oct 10 '21 at 18:04
  • I don't see why you use `set()` here. – buran Oct 10 '21 at 18:04
  • Also, there are no comprehensions in this code. You're using a generator expression. A set comprehension would look like `date_set = { for x in }`. – wjandrea Oct 10 '21 at 18:05
  • You can do set comprehension like this or with {}, tried it on python3.9 `set(a for a in range(0,10))` {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} – v1z3 Oct 10 '21 at 18:08
  • 1
    The use case is I have a bunch of data with filenames containing dates, and I create a list of `datetime.date` objects, then I try to find if there are any missing. So set is appropriate because they should not repeat. The problem is the expression is ugly. – v1z3 Oct 10 '21 at 18:09
  • @v1z3, fight the reason. Do not generate duplicates. `set` is not proper container for ordered data. – Olvin Roght Oct 10 '21 at 18:11
  • *"it can't access the `dates` variable"* -- I can't reproduce that problem. Please provide a [mre]. – wjandrea Oct 10 '21 at 22:34
  • @wjandrea Apparently, the issue only occurs when you use the python debugger pdb because of variable scopes. The expression works fine when not debugging. – v1z3 Oct 10 '21 at 22:55

2 Answers2

1

Something like the below. find min & max. loop from min to max and see which date is missing.

from datetime import timedelta, date

dates = [
    date(2014, 11, 21),
    date(2014, 11, 24),
    date(2014, 11, 25),
    date(2014, 11, 26),
    date(2014, 11, 27),
    date(2014, 11, 28),
    date(2014, 11, 29),
    date(2014, 11, 30),
    date(2014, 12, 1)
]
_min = min(dates)
_max = max(dates)
missing = []
while _min < _max:
    if _min not in dates:
        missing.append(_min)
    _min += timedelta(days=1)
print(missing)

output

[datetime.date(2014, 11, 22), datetime.date(2014, 11, 23)]
balderman
  • 22,927
  • 7
  • 34
  • 52
  • Sorry I have to accept Olvin's answer because it used a comprehension. But they are both good, the other one helped me understand what the problem is more. – v1z3 Oct 10 '21 at 18:53
1

If your dates is sorted, you just need to iterate over it and add dates between into new list. Possible one-line solution I've already provided in this comment.

from datetime import date, timedelta

dates = [
    date(2014, 11, 24), date(2014, 11, 25), date(2014, 11, 26),
    date(2014, 11, 28), date(2014, 11, 29), date(2014, 11, 30),
    date(2014, 12, 1)
]
missing = [d + timedelta(days=j) for i, d in enumerate(dates[:-1], 1) for j in range(1, (dates[i] - d).days)]

You can do it using regular for loops:

from datetime import date, timedelta

dates = [
    date(2014, 11, 24), date(2014, 11, 25), date(2014, 11, 26),
    date(2014, 11, 28), date(2014, 11, 29), date(2014, 11, 30),
    date(2014, 12, 1)
]

missing = []
for next_index, current_date in enumerate(dates[:-1], 1):
    for days_diff in range(1, (dates[next_index] - current_date).days):
        missing.append(current_date + timedelta(days=days_diff))
Olvin Roght
  • 7,677
  • 2
  • 16
  • 35
  • You're right it does work. What's really weird is that your code works, my original expression works (in a script). The errors only happen while having PDB (the debugger open) in the middle of the function, it fails with `*** NameError: name 'dates' is not defined` – v1z3 Oct 10 '21 at 18:46
  • That's a nice minimal solution, thanks! I'll give it a little bit of time to see if anyone else has an explanation / solution. – v1z3 Oct 10 '21 at 18:47
  • @v1z3, it's weird as `dates` is definitely defined. – Olvin Roght Oct 10 '21 at 18:47
  • @v1z3, you can ask if you don't understand something in my code. – Olvin Roght Oct 10 '21 at 18:48
  • It may be an issue with the debugger variable scopes. I have two expressions failing and one succeeding in the debugger, all 3 work outside of it. – v1z3 Oct 10 '21 at 18:49