Problem Statement: I'm working with transaction data for all of a hospital's visits and I need to remove every bad debt transaction after the first for each patient.
Issue I'm Having: My code works on a small dataset, but the actual data set is about 5GB and 13M rows. The code has been running for several days now and still hasn't finished. For background, my code is in a Jupyter notebook running on a standard work PC.
Sample Data
import pandas as pd
df = pd.DataFrame({"PatientAccountNumber":[113,113,113,113,225,225,225,225,225,225,225],
"TransactionCode":['50','50','77','60','22','77','25','77','25','77','77'],
"Bucket":['Charity','Charity','Bad Debt','3rd Party','Self Pay','Bad Debt',
'Charity','Bad Debt','Charity','Bad Debt','Bad Debt']})
print(df)
Sample Dataframe
PatientAccountNumber TransactionCode Bucket
0 113 50 Charity
1 113 50 Charity
2 113 77 Bad Debt
3 113 60 3rd Party
4 225 22 Self Pay
5 225 77 Bad Debt
6 225 25 Charity
7 225 77 Bad Debt
8 225 25 Charity
9 225 77 Bad Debt
10 225 77 Bad Debt
Solution
for account in df['PatientAccountNumber'].unique():
mask = (df['PatientAccountNumber'] == account) & (df['Bucket'] == 'Bad Debt')
df.drop(df[mask].index[1:],inplace=True)
print(df)
Desired Result (Each patient should have a maximum of one Bad Debt transaction)
PatientAccountNumber TransactionCode Bucket
0 113 50 Charity
1 113 50 Charity
2 113 77 Bad Debt
3 113 60 3rd Party
4 225 22 Self Pay
5 225 77 Bad Debt
6 225 25 Charity
8 225 25 Charity
Alternate Solution
for account in df['PatientAccountNumber'].unique():
mask = (df['PatientAccountNumber'] == account) & (df['Bucket'] == 'Bad Debt')
mask = mask & (mask.cumsum() > 1)
df.loc[mask, 'Bucket'] = 'DELETE'
df = df[df['Bucket'] != 'DELETE]
Attempted using Dask
I thought maybe Dask would be able to help me out, but I got the following error codes:
- Using Dask on first solution - "NotImplementedError: Series getitem in only supported for other series objects with matching partition structure"
- Using Dask on second solution - "TypeError: '_LocIndexer' object does not support item assignment"