I think this one is simple but I am missing how to get the data out I want.
I have two fields in a dataframe called ClosedDate and Is_Article_Linked. I want to group this by ClosedDate's Month and then tally only only when Is_Article_Linked is true but still keep the total records.
So for example my code looks like this
data = [
[ ('ClosedDate', '2021-01-18T12:46:38.000+0000'), ('Is_Article_Linked', True) ],
[ ('ClosedDate', '2021-01-29T12:46:38.000+0000'), ('Is_Article_Linked', True) ],
[ ('ClosedDate', '2021-01-11T12:46:38.000+0000'), ('Is_Article_Linked', False) ],
[ ('ClosedDate', '2021-01-03T12:46:38.000+0000'), ('Is_Article_Linked', False) ],
[ ('ClosedDate', '2021-01-12T12:46:38.000+0000'), ('Is_Article_Linked', True) ]
]
df_ld = pd.DataFrame(data, columns=['ClosedDate', 'Is_Article_Linked'])
# Formatting the date field
df_ld['ClosedDate'] = pd.to_datetime(df_ld['ClosedDate'], format="%y-%m-%d", exact=False)
# Grouping by Month
relevant_links = df_ld.resample('M', on='ClosedDate').count()
# Renaming the index
relevant_links.index = relevant_links.index.rename('Date')
# Outputting to a dictionary
link_accuracy_by_month = relevant_links.reset_index().to_dict('records')
print(link_accuracy_by_month)
What this outputs is [{'Date': Timestamp('2021-01-31 00:00:00'), 'ClosedDate': 5, 'Is_Article_Linked': 5}]
What i am expecting to see is [{'Date': Timestamp('2021-01-31 00:00:00'), 'ClosedDate': 5, 'Is_Article_Linked': 3}]
So ClosedDate is my total records and Is_Article_Linked is only tallying the True's. Currently it is tallying all of them because i assume it is seeing a value in there.
Is there anyway to get the results I am looking for? I believe my answer is in the resample but i just can't figure out how to get it to count just the true's.
Thanks in advance!