0

I have a dataset that describes a question paper. I would like to map it with the marks obtained by the students and do some analysis. For that purpose, I wanted to melt the question paper details.

 Question   Marks   BL  PI  CO  PSO PO
0   Q1  2   2   1.4 1   1   1,2
1   Q2  2   3   2.1 2   1   1,2
2   Q3  2   2   1.1 1   1   1,2
3   Q4  2   2   1.1 1   1   1,2
4   Q5  4   2   1.1 1   1   1,2

I tried

qns = pd.read_csv('https://raw.githubusercontent.com/davidrajdgl/CodeSnippets/master/questions.tsv', sep="\t")
qns_melted = qns.melt(id_vars = ["Question", "Marks"])
qns_melted

But this doesnot seems to be helpful since, I wanted the value column to be generated by spliting the columns in there is a comma in the cell value.

For example,

Q1  2   PO  1
Q1  2   PO  2

How to achieve this?

David
  • 524
  • 1
  • 7
  • 24
  • why PO values in your dataframe is 1,2 and in your expected output is 2? – ansev Nov 17 '19 at 10:28
  • @ansev I wanted in that way only... since, I need to group by again by all the categories and then again inside...like "PO" is programme objective and 1 and 2 represents that this q2 falls under PO 1 and PO 2. So I need this information inf long format to merge with the students mark details for each question and then perform the analysis. – David Nov 17 '19 at 11:37
  • After you've melted it, can you then not `.str.split(',').explode()` your value column? – Jon Clements Nov 17 '19 at 11:39
  • (Although you might want to end up going with something like: `m = df.set_index(['Question', 'Marks']).melt()`then split/explode accordingly – Jon Clements Nov 17 '19 at 11:41
  • @JonClements I get this error with this code: 'Series' object has no attribute 'explode' – David Nov 17 '19 at 12:29
  • @David sure: *New in version 0.25.0.* – Jon Clements Nov 17 '19 at 12:30

1 Answers1

1

One of possible options is:

  • split PO column into 2 new columns,
  • rename them as you wish,
  • drop the original PO column.

To perform this, run:

qns = qns.join(qns.PO.str.split(',', expand=True)\
    .rename(columns={0: 'PO_1', 1: 'PO_2'})).drop(columns='PO')

and then you can melt this DataFrame.

Edit

If the number of "segments" in PO column can vary, only a slight change is needed. Instead of renaming "new" columns by name, add a prefix:

qns = qns.join(qns.PO.str.split(',', expand=True)\
    .add_prefix('PO_')).drop(columns='PO')

But note that this time some columns can have None values, so to exclude them from the melt result, add .dropna() to melt instruction:

qns.melt(id_vars = ["Question", "Marks"]).dropna()
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • This would work except the renaming part since I have to rename it dynamically. Because, I dont know how many PO_n's will be there in the dataset. Anything we can do about it? – David Nov 17 '19 at 15:12
  • Awesome...that helped perfectly...Thanks a lot.. Finally I just need to remove the "_[0-9]*" with empty string to get what I wanted. Thanks again...You saved my day.. – David Nov 17 '19 at 19:18