Group together arbitrary date objects that are within a time range of each other

Question

I want to split the calendar into two-week intervals starting at 2008-May-5, or any arbitrary starting point.

So I start with several date objects:

import datetime as DT

raw = ("2010-08-01",
       "2010-06-25",
       "2010-07-01",
       "2010-07-08")

transactions = [(DT.datetime.strptime(datestring, "%Y-%m-%d").date(),
                 "Some data here") for datestring in raw]
transactions.sort()

By manually analyzing the dates, I am quite able to figure out which dates fall within the same fortnight interval. I want to get grouping that's similar to this one:

# Fortnight interval 1
(datetime.date(2010, 6, 25), 'Some data here')
(datetime.date(2010, 7, 1), 'Some data here')
(datetime.date(2010, 7, 8), 'Some data here')

# Fortnight interval 2
(datetime.date(2010, 8, 1), 'Some data here')

unutbu · Accepted Answer · 2010-08-07T13:15:49.727

12

import datetime as DT
import itertools

start_date=DT.date(2008,5,5)

def mkdate(datestring):
    return DT.datetime.strptime(datestring, "%Y-%m-%d").date()

def fortnight(date):
    return (date-start_date).days //14

raw = ("2010-08-01",
       "2010-06-25",
       "2010-07-01",
       "2010-07-08")
transactions=[(date,"Some data") for date in map(mkdate,raw)]
transactions.sort(key=lambda (date,data):date)

for key,grp in itertools.groupby(transactions,key=lambda (date,data):fortnight(date)):
    print(key,list(grp))

yields

# (55, [(datetime.date(2010, 6, 25), 'Some data')])
# (56, [(datetime.date(2010, 7, 1), 'Some data'), (datetime.date(2010, 7, 8), 'Some data')])
# (58, [(datetime.date(2010, 8, 1), 'Some data')])

Note that 2010-6-25 is in the 55th fortnight from 2008-5-5, while 2010-7-1 is in the 56th. If you want them grouped together, simply change start_date (to something like 2008-5-16).

PS. The key tool used above is itertools.groupby, which is explained in detail here.

Edit: The lambdas are simply a way to make "anonymous" functions. (They are anonymous in the sense that they are not given names like functions defined by def). Anywhere you see a lambda, it is also possible to use a def to create an equivalent function. For example, you could do this:

import operator
transactions.sort(key=operator.itemgetter(0))

def transaction_fortnight(transaction):
    date,data=transaction
    return fortnight(date)

for key,grp in itertools.groupby(transactions,key=transaction_fortnight):
    print(key,list(grp))

edited Aug 07 '10 at 13:15

answered Aug 07 '10 at 12:35

unutbu

842,883
184
1,785
1,677

2

`//14` is the same as `/14` in Python2, but is necessary in Python3 to get integer division (since `/14` gives floating-point division in Python3). By using `//14` you future-proof your code a little bit. See http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex – unutbu Aug 07 '10 at 12:57
1

// is used as integer division but actually it is division by numbers with result automatically rounded down to nearest integer. When used with floats the result stays float. – Tony Veijalainen Aug 07 '10 at 13:00
I'm not sure if I understand how the `lambda` works here. As I understand about `lambdas`, they're particularly useful for making them work over `iterable`s. Do `sort()` and `groupby()` perform some iteration operations on their `key`s? – Kit Aug 07 '10 at 13:01
Thank you, @Tony. It's good to point out that `//` is *not* the same as `/` (in Python2) when operating on floats. – unutbu Aug 07 '10 at 13:03
1

@Kit: In `groupby`, each element in `transactions` is handed to the `key` function. An element of `transactions` is a tuple `(date,data)`. The `key` function `lambda (date,data):fortnight(date)` receives `(date,data`) as input and returns `fortnight(date)`. This is just an integer used to classify which group `(date,data)` should be grouped with. – unutbu Aug 07 '10 at 13:07
@Kit: `lambda`s are just a way to create functions. They don't necessarily have anything to do with iterables, but you are right, they show up a lot with `sort` and `groupby` because those functions take `key` arguments which expect functions. I could rewrite the above without any `lambda`s. I'll edit my post to show what I mean. – unutbu Aug 07 '10 at 13:09
Please validate my understanding. In every iteration of `groupby`, `key` __sometimes__ receives a new integer value (or any type, integers at least in this example). Then every element of `transactions` that gets the same `key` gets grouped together. Am I correct? – Kit Aug 07 '10 at 13:18
1

You are correct Kit, @unutbu: // is not limited for Python3, it is same in Python2 also (actually I normally use Python 2.7 and tested it there). So yes it is not same as Python2 /, but same as Python2 //. – Tony Veijalainen Aug 07 '10 at 13:21
1

`groupy` iterates over the elements of `transactions`. Each element is passed to the function specified by the `key` argument. The `key` function does not receive an integer, it returns an integer. The consecutive elements that have the same integer are grouped together. – unutbu Aug 07 '10 at 13:22
@Kit: You're very welcome. `itertools` is a great tool to have in your pocket, well worth every second spent studying it. – unutbu Aug 07 '10 at 13:29

Tony Veijalainen · Answer 2 · 2010-08-07T13:15:11.067

Use itertools groupby with lambda function to divide by the length of period the distance from starting point.

>>> for i, group in groupby(range(30), lambda x: x // 7):
    print list(group)


[0, 1, 2, 3, 4, 5, 6]
[7, 8, 9, 10, 11, 12, 13]
[14, 15, 16, 17, 18, 19, 20]
[21, 22, 23, 24, 25, 26, 27]
[28, 29]

So with dates:

import itertools as it
start = DT.date(2008,5,5)
lenperiod = 14

for fnight,info in it.groupby(transactions,lambda data: (data[0]-start).days // lenperiod):
    print list(info)

You can use also weeknumbers from strftime, and lenperiod in number of weeks:

for fnight,info in it.groupby(transactions,lambda data: int (data[0].strftime('%W')) // lenperiod):
    print list(info)

score 1 · Answer 3 · answered Mar 25 '16 at 23:17

Using a pandas DataFrame with resample works too. Given OP's data, but change "some data here" to 'abcd'.

>>> import datetime as DT
>>> raw = ("2010-08-01",
...        "2010-06-25",
...        "2010-07-01",
...        "2010-07-08")
>>> transactions = [(DT.datetime.strptime(datestring, "%Y-%m-%d"), data) for
...                 datestring, data in zip(raw,'abcd')]
[(datetime.datetime(2010, 8, 1, 0, 0), 'a'),
 (datetime.datetime(2010, 6, 25, 0, 0), 'b'),
 (datetime.datetime(2010, 7, 1, 0, 0), 'c'),
 (datetime.datetime(2010, 7, 8, 0, 0), 'd')]

Now try using pandas. First create a DataFrame, naming the columns and setting the indices to the dates.

>>> import pandas as pd
>>> df = pd.DataFrame(transactions,
...                   columns=['date','data']).set_index('date')
           data
date
2010-08-01    a
2010-06-25    b
2010-07-01    c
2010-07-08    d

Now use the Series Offset Aliases to every 2 weeks starting on Sundays and concatenate the results.

>>> fortnight = df.resample('2W-SUN').sum()
           data
date
2010-06-27    b
2010-07-11   cd
2010-07-25    0
2010-08-08    a

Now drill into the data as needed by weekstart

>>> fortnight.loc['2010-06-27']['data']
b

or index

>>> fortnight.iloc[0]['data']
b

or indices

>>> data = fortnight.iloc[:2]['data']
b
date
2010-06-27     b
2010-07-11    cd
Freq: 2W-SUN, Name: data, dtype: object
>>> data[0]
b
>>> data[1]
cd

Group together arbitrary date objects that are within a time range of each other

3 Answers3

Linked