I have a large CSV file with the following structure:
doc_id, inclusion, id
1, TRUE, 842
1, FALSE, 768
1, FALSE, 292
1, FALSE, 393
2, TRUE, 191
2, TRUE, 389
2, TRUE, 171
...
The id
is the primary key. doc_id
is a foreign and represents the document which each id is linked to. Many id's are linked to each document. Each id
is classified as included (ie. inclusion == True
) or excluded (ie. inclusion == False
).
I need to create a summary table which displays for each doc_id
, the number of id
's included and excluded. I can do this for all doc_id
's but am not sure how to implement it for each doc_id
. For example, for the data above I want:
doc_id included excluded
1 1 3
2 3 0
...
This is my current code:
for index, row in citationData.iterrows():
if row.included == True:
inc = inc + 1
else:
exc = exc + 1