Given your revised question and initial dataset (I've revised your dataset slightly to include two customers):
df = pd.DataFrame({'CustomerId': ['001', '001', '002', '002'],
'SaleDate': ['2017-01-10', '2017-04-10', '2017-08-10', '2017-09-10'],
'Quantity': [5, 1, 1, 6]})
You can easily include the average time between transactions (in days) in your group by with the following code:
NOTE: This will only work if you dataset is ordered by CustomerID then SaleDate.
import pandas as pd
df = pd.DataFrame({'CustomerId': ['001', '001', '002', '002'],
'SaleDate': ['2017-01-10', '2017-04-10', '2017-08-10', '2017-09-10'],
'Quantity': ['5', '1', '1', '6']})
# convert the string date to a datetime
df['SaleDate'] = pd.to_datetime(df.SaleDate)
# sort the dataset
df = df.sort_values(['CustomerId', 'SaleDate'])
# calculate the difference between each date in days
# (the .shift method will offset the rows, play around with this to understand how it works
# - We apply this to every customer using a groupby
df2 = df.groupby("CustomerId").apply(
lambda df: (df.SaleDate - df.SaleDate.shift(1)).dt.days).reset_index()
df2 = df2.rename(columns={'SaleDate': 'time-between-sales'})
df2.index = df2.level_1
# then join the result back on to the original dataframe
df = df.join(df2['time-between-sales'])
# add the mean time to your groupby
grouped = df.groupby("CustomerId").agg({
"SaleDate": ["min", "max"],
"Quantity": "sum",
"time-between-sales": "mean"})
# rename columns per your original specification
grouped.columns = grouped.columns.get_level_values(0) + grouped.columns.get_level_values(1)
grouped = grouped.rename(columns={
'SaleDatemin': 'EarliestSale',
'SaleDatemax': 'LatestSale',
'Quantitysum': 'NumPurchases',
'time-between-salesmean': 'avgTimeBetweenPurchases'})
print(grouped)
EarliestSale LatestSale NumPurchases avgTimeBetweenPurchases
CustomerId
001 2017-01-10 2017-04-10 6 90.0
002 2017-08-10 2017-09-10 7 31.0