I want to transform the structure of a pandas dataframe by unpivoting the column name into rows if the row has an "x" value.
Here is a small extraction of the excel file. There is a combination of categorical variables/dummy variables such as Line of business and four dummy variables indicating which data categories are used in a given process - such as Customer.
| Process name | Line of business | Customer | Potential customer | Employee | Vendor |
|--------------|------------------|----------|--------------------|----------|--------|
| Ad campaign | Marketing | x | x | | x |
| Payroll | HR | | | x | x |
df = pd.DataFrame({'Process name': {0: 'Ad campaign', 1: 'Payroll'},
'Line of business': {0: "Marketing", 1: "HR"},
'Customer': {0: "x", 1: ""},
'Potential customer': {0: "x", 1: ""},
'Employee': {0: "", 1: "x"},
'Vendor': {0: "x", 1: "x"}})
The desired output would look like this. This is only a small extraction of the data, so I would like a solution that does not need to reference the column names, and rather by indexing. I also need to do the unpivot operation on a few other groups of columns.
| Process name | Line of business | Data category |
|--------------|------------------|--------------------|
| Ad campaign | Marketing | Customer |
| Ad campaign | Marketing | Potential customer |
| Ad campaign | Marketing | Vendor |
| Payroll | HR | Employee |
| Payroll | HR | Vendor |
I have tried using pd.melt and pd.pivot, but I cant seem to figure it out.
Solution: Thanks for pointing me in the right direction @jezrael
Here is how I implemented it:
##pivot selected columns
df = df.melt(id_vars=['Process name', 'Line of business'],
value_vars=('Customer', 'Potential customer', 'Employee', 'Vendor'),
var_name='Data category')
##remove rows that does not contain x
df = df[df['value'].str.contains("x")]
##sort alphabetically
df = df.sort_values("Process name")