The results of an opinion pool are stored in CSV format in which questions with multiple answers are coded with 0 and 1. For example, if the question "which is your favorite color?" is asked to 100 people, and possible answers are limited to red, blue, green, a result such as
colors = ['red', 'blue', 'green']
votes = [33, 57, 10]
will be stored by means of three columns: one for 'red' answer containing 33 1's and 67 0's, one for 'blue' containing 57 1's and 43 0's, and one for 'green' containing 10 1's and 90 0's.
I would like to transform my data frame so that these three colums are merged into a unique column containing occurences of 'red', 'blue' and 'green'.
Here is a simplified example of my problem:
from pandas import DataFrame
actual_pool = {'foo': [0, 1, 2, 3, 4, 5, 6, 7],
'red': [1, 0, 0, 1, 0, 1, 0, 0],
'blue': [0, 1, 0, 0, 0, 0, 1, 1],
'green': [0, 0, 1, 0, 1, 0, 0, 0],
'bar': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']}
actual_df = DataFrame(actual_pool,columns= ['foo', 'red', 'blue', 'green', 'bar'])
expected_pool = {'foo': [0, 1, 2, 3, 4, 5, 6, 7],
'colors': ['red', 'blue', 'green', 'red', 'green', 'red', 'blue', 'blue'],
'bar': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']}
expected_df = DataFrame(expected_pool,columns= ['foo', 'colors', 'bar'])
print(actual_df)
print(expected_df)
The question is How can I get expected_df from actual_df with pandas?.
EDIT 1: add columns 'foo' and 'bar' because I am interested in merging a subsets of columns of data frames.