1

Is there a better solution than df['weekofyear'] = df['date'].dt.weekofyear?

The problem of this solution is that, sometimes, the days after the last week of the year n but before the first week of the year n+1 are counted as week 1 and and not as week 0.

I am working with pyspark and koalas (no pandas allowed).

Here is an example:

Problematic df

As you can see, the first column is Date, the second one is week, the third is month and last is year.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Ousen92i
  • 137
  • 1
  • 8
  • could you show examples of some dates that suffer from this problem? – mck Dec 24 '20 at 10:48
  • Yes of course, I edited the post – Ousen92i Dec 24 '20 at 10:52
  • Date - Weekofyear - Month - Year. But the problem is only on the second column. – Ousen92i Dec 24 '20 at 10:55
  • why would you want them to be week 0...? they are technically on the same week, though separated across the year. – mck Dec 24 '20 at 11:04
  • Because I want to goupby(weekofyear) my data. And the problem is that for example, Week 1 tooks the real first week of 2019, but add the to it the last day of 2019 wich are counted as week 1 as well. It makes no sense. – Ousen92i Dec 24 '20 at 11:08
  • @Joe Not really, similar problem but no viable solution – Ousen92i Dec 24 '20 at 11:09
  • 1
    I guess you want them to be week 53, not week 0, right? – mck Dec 24 '20 at 11:12
  • ISO 'week number' is only useful when paired with the *year* of the ISO week number, which won't necessarily be the calendar year. e.g., `date -d '2019-12-31' +%G-W%V` gives `2020-W01`. If there's no way to get that year, `weekofyear` is difficult to use correctly. – Joe Dec 24 '20 at 12:40

1 Answers1

1

Not sure if this is what you want...? I suppose you can use case when to replace the undesired values of week of year.

df['weekofyear'] = df['date'].dt.weekofyear

df2 = ks.sql("""
select
    date,
    case when weekofyear = 1 and month = 12 then 53 else weekofyear end as weekofyear,
    month,
    year
from {df}""")
mck
  • 40,932
  • 13
  • 35
  • 50
  • 1
    @Ousen92i I've added another way, which is more pythonic and pandas-like - see if that works for you? – mck Dec 24 '20 at 12:29
  • When I tried the 2nd way, I got an error : 'Series' object does not support item assignment – Ousen92i Dec 24 '20 at 13:59
  • @Ousen92i okay, I didn't expect it to work anyway, how about the first one? – mck Dec 24 '20 at 14:00