How can I calculate the percentage of given binary variables between groups in a dataframe

Question

I have a dataframe with multiple person answering multiple questions. Questions are operationalized as 1=agree and 0=not agree. The same person has answered multiple question and now I want to find out the percentage of agree statement, hence 1, compared to the totality of answers. The dataframe is organized that there is one row per question. People answer 8 questions each, so we have 8 rows for every person. I would like to calculate the percentage of "agree" (or 1) statement for every person, compared to the totality of the questions every single person answered (hence 8).

score 0 · Answer 1 · answered Jan 04 '21 at 19:26

# display how the targets are distributed
def configure_target_statistic(targets):
    trg_cnt = targets.value_counts()
    labels, sizes = (np.array(trg_cnt.index)), (np.array(100*(trg_cnt/trg_cnt.sum())))
    py.iplot(go.Figure(data=[go.Pie(labels=labels, values=sizes)], layout=go.Layout(title='Target Distribution',font=dict(size=15),width=500, height=500)))
    return trg_cnt

configure_target_statistic(df['answers'])

you need only imports, this should be enough:

import numpy as np
import pandas as pd
import plotly.offline as py
import plotly.graph_objs as go

score 0 · Answer 2 · answered Jan 04 '21 at 20:55

Assuming your dataframe has two columns user_id and question_id, by which you identify each row, here is a simple solution:

import pandas as pd    
df=pd.DataFrame([[1,6,1],[1,7,1],[2,6, 1],[2,7, 0]],columns=  ['user_id','question_id','agree'])
grp=df.groupby(['user_id'])['agree']
print(100*grp.sum()/grp.count())

In the code(the last line), I am only considering the number of questions a user has attempted for calculating the percentage.

How can I calculate the percentage of given binary variables between groups in a dataframe

2 Answers2