0

I did find this comment from @HenryHeikkinen very usefull How to group by week in postgresql

So I did correct my code. But still it is not ok. I did find some situation when it is not working correctly. When year is chaning during first week.

My data model keeps some value for each day in year. I want to group data by week and compare several years.

year 2018 is interesting 1.1.2018 is monday so first week of 2018 start at that day. last week of 2018 is 52 and starting at 24.12.2018

31.12.2018 is monday next week, but it is first week of 2019 and thats OK. But grouping by week, the result for that week (week 1 of year 2019) has week date = 31.12.2018, so extracting year info from that gives 2018 instead 2019 and thats make duplicity for week 1 of 2018.

Year 2019 has that problem at beging and at the end.

  • Week 1 of 2019 start at 31.12.2018
  • Week 1 of 2020 start at 30.12.2019

here is my django code which return week data

self.dataday_set.annotate(ww=TruncWeek('date')) \
                            .values('ww') \
                            .annotate(consumption=Sum('consumption'), amount=Max('amount'), w=ExtractWeek('ww'), r=ExtractYear('ww')) \
                            .order_by('-ww')

here is output section between 2017 and 2018, which is OK

{'ww': datetime.date(2018, 1, 8), 'consumption': Decimal('42.39700'), 'amount': Decimal('12878.230'), 'w': 2, 'r': 2018}
{'ww': datetime.date(2018, 1, 1), 'consumption': Decimal('56.60000'), 'amount': Decimal('12835.833'), 'w': 1, 'r': 2018}
{'ww': datetime.date(2017, 12, 25), 'consumption': Decimal('45.95800'), 'amount': Decimal('12779.233'), 'w': 52, 'r': 2017}
{'ww': datetime.date(2017, 12, 18), 'consumption': Decimal('43.71600'), 'amount': Decimal('12733.275'), 'w': 51, 'r': 2017}

                        

section between 2018 and 2019, and as you can see week 1 of 2019 is listed as week of 2018

{'ww': datetime.date(2019, 1, 14), 'consumption': Decimal('40.03400'), 'amount': Decimal('14911.079'), 'w': 3, 'r': 2019}
{'ww': datetime.date(2019, 1, 7), 'consumption': Decimal('47.14400'), 'amount': Decimal('14871.045'), 'w': 2, 'r': 2019}
{'ww': datetime.date(2018, 12, 31), 'consumption': Decimal('52.35000'), 'amount': Decimal('14823.901'), 'w': 1, 'r': 2018}
{'ww': datetime.date(2018, 12, 24), 'consumption': Decimal('48.69800'), 'amount': Decimal('14771.551'), 'w': 52, 'r': 2018}
{'ww': datetime.date(2018, 12, 17), 'consumption': Decimal('45.07700'), 'amount': Decimal('14722.853'), 'w': 51, 'r': 2018}
{'ww': datetime.date(2018, 12, 10), 'consumption': Decimal('44.17100'), 'amount': Decimal('14677.776'), 'w': 50, 'r': 2018}

and section between 2019 and 2020, same problem

{'ww': datetime.date(2020, 1, 13), 'consumption': Decimal('39.31200'), 'amount': Decimal('16976.174'), 'w': 3, 'r': 2020}
{'ww': datetime.date(2020, 1, 6), 'consumption': Decimal('50.84500'), 'amount': Decimal('16936.862'), 'w': 2, 'r': 2020}
{'ww': datetime.date(2019, 12, 30), 'consumption': Decimal('51.67200'), 'amount': Decimal('16886.017'), 'w': 1, 'r': 2019}
{'ww': datetime.date(2019, 12, 23), 'consumption': Decimal('49.71200'), 'amount': Decimal('16834.345'), 'w': 52, 'r': 2019}
{'ww': datetime.date(2019, 12, 16), 'consumption': Decimal('43.23600'), 'amount': Decimal('16784.633'), 'w': 51, 'r': 2019}
{'ww': datetime.date(2019, 12, 9), 'consumption': Decimal('41.13100'), 'amount': Decimal('16741.397'), 'w': 50, 'r': 2019}

I am passing that queryset to pandas dataframe and creating pivot table from field r w and consumption.

Thanks for any tip how could I solve this in django instead of creating some custom code. Or maybe there is nice solution with pandas using just ww and consumption field to create pivot table where I have years in row and weeks in columns

janci
  • 49
  • 6

2 Answers2

0

This issue or reasoning is described here: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

The number of the ISO 8601 week-numbering week of the year. By definition, ISO weeks start on Mondays and the first week of a year contains January 4 of that year. In other words, the first Thursday of a year is in week 1 of that year.

In the ISO week-numbering system, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for late-December dates to be part of the first week of the next year. For example, 2005-01-01 is part of the 53rd week of year 2004, and 2006-01-01 is part of the 52nd week of year 2005, while 2012-12-31 is part of the first week of 2013. It's recommended to use the isoyear field together with week to get consistent results.

I suggest you calculate the week like this:

self.dataday_set\
    .values('date')\
    .annotate(
        consumption=Sum('consumption'),
        amount=Max('amount'),
        w=Greatest(Round(Extract('date', 'doy') / 7), 1),
        r=Extract('date', 'year')
    )\
    .order_by('-date')

Round() is used to round to he nearest integer (PostgreSQL rounds 0.5 to 1), Greatest() is used to adjust for the first week of the year.

Overall, this is still going to be an issue in certain situations. For example, 2022-01-03 will be Week 1, maybe you expect Week 2. Week 1 is correct, if you consider that week starts on Monday. But 2022-01-17 will be calculated wrong.

You have to choose between ISO 8601 week or this.

I would use ISO 8601 as suggest by PostgreSQL and change the code to:

self.dataday_set\
    .values('date')\
    .annotate(
        consumption=Sum('consumption'),
        amount=Max('amount'),
        w=Extract('date', 'week'),
        r=Extract('date', 'isoyear')
    )\
    .order_by('-date')
Borut
  • 3,304
  • 2
  • 12
  • 18
0

I did find solution and final django code looks like this

self.dataday_set.annotate(ww=TruncWeek('date')) \
                            .values('ww') \
                            .annotate(consumption=Sum('consumption'), amount=Max('amount'), w=ExtractWeek('ww'), r=ExtractIsoYear('ww')) \
                            .order_by('-ww')
janci
  • 49
  • 6