How to melt dataframe when a column has two values for a variable in pandas?

Question

I have a dataset that describes a question paper. I would like to map it with the marks obtained by the students and do some analysis. For that purpose, I wanted to melt the question paper details.

 Question   Marks   BL  PI  CO  PSO PO
0   Q1  2   2   1.4 1   1   1,2
1   Q2  2   3   2.1 2   1   1,2
2   Q3  2   2   1.1 1   1   1,2
3   Q4  2   2   1.1 1   1   1,2
4   Q5  4   2   1.1 1   1   1,2

I tried

qns = pd.read_csv('https://raw.githubusercontent.com/davidrajdgl/CodeSnippets/master/questions.tsv', sep="\t")
qns_melted = qns.melt(id_vars = ["Question", "Marks"])
qns_melted

But this doesnot seems to be helpful since, I wanted the value column to be generated by spliting the columns in there is a comma in the cell value.

For example,

Q1  2   PO  1
Q1  2   PO  2

How to achieve this?

why PO values in your dataframe is 1,2 and in your expected output is 2? — ansev, Nov 17 '19 at 10:28
@ansev I wanted in that way only... since, I need to group by again by all the categories and then again inside...like "PO" is programme objective and 1 and 2 represents that this q2 falls under PO 1 and PO 2. So I need this information inf long format to merge with the students mark details for each question and then perform the analysis. — David, Nov 17 '19 at 11:37
After you've melted it, can you then not `.str.split(',').explode()` your value column? — Jon Clements, Nov 17 '19 at 11:39
(Although you might want to end up going with something like: `m = df.set_index(['Question', 'Marks']).melt()`then split/explode accordingly — Jon Clements, Nov 17 '19 at 11:41
@JonClements I get this error with this code: 'Series' object has no attribute 'explode' — David, Nov 17 '19 at 12:29

Valdi_Bo · Accepted Answer · 2019-11-17T15:39:57.753

1

One of possible options is:

split PO column into 2 new columns,
rename them as you wish,
drop the original PO column.

To perform this, run:

qns = qns.join(qns.PO.str.split(',', expand=True)\
    .rename(columns={0: 'PO_1', 1: 'PO_2'})).drop(columns='PO')

and then you can melt this DataFrame.

Edit

If the number of "segments" in PO column can vary, only a slight change is needed. Instead of renaming "new" columns by name, add a prefix:

qns = qns.join(qns.PO.str.split(',', expand=True)\
    .add_prefix('PO_')).drop(columns='PO')

But note that this time some columns can have None values, so to exclude them from the melt result, add .dropna() to melt instruction:

qns.melt(id_vars = ["Question", "Marks"]).dropna()

edited Nov 17 '19 at 15:39

answered Nov 17 '19 at 13:20

Valdi_Bo

30,023
4
23
41

This would work except the renaming part since I have to rename it dynamically. Because, I dont know how many PO_n's will be there in the dataset. Anything we can do about it? – David Nov 17 '19 at 15:12
Awesome...that helped perfectly...Thanks a lot.. Finally I just need to remove the "_[0-9]*" with empty string to get what I wanted. Thanks again...You saved my day.. – David Nov 17 '19 at 19:18

How to melt dataframe when a column has two values for a variable in pandas?

1 Answers1

Edit