I have a pandas dataframe school_df
that looks like this:
school_id date_posted date_completed
0 A 2014-01-01 2014-01-01
1 A 2014-01-01 2014-01-08
2 A 2014-04-29 2014-05-01
3 B 2014-01-01 2014-01-01
4 B 2014-01-20 2014-02-23
Each row represents one project by that school. I'd like to add two columns: for each unique school_id
, a count of how many projects were posted before that date and a count of how many projects were completed before that date.
The code below works, but I have ~300,000 unique schools, so it's taking a long time to run. Is there a faster way to get what I am looking for? Thank you for your assistance!
import pandas as pd
groups = school_df.groupby("school_id")
blank_df = pd.DataFrame()
for g, df in groups:
df['school_previous_projects'] = df.date_posted.map(lambda x: len(df[df.date_posted < x]))
df['school_previous_completed'] = df.date_posted.map(lambda x: len(df[df.date_completed < x]))
blank_df = pd.concat([blank_df, df])