2

I'm have an excel document formatted like so (Columns are datasets, Rows are cell types, values are comma-delineated gene names) Input Excel Format

I would like to reformat the sheet like so (Columns are still datasets, but Rows are now gene names, and values are cell types):

Output Excel Format

I was trying to do this in pandas. I imported the input as a dataFrame and called it 'test'. My logic was to loop over each column, and within loop over each row, take the comma-delineated values, split those, and then make each of those a new index.

This approach is obviously pretty inefficient, but I'm not even able to get it to work as expected yet (even though I'm not getting an error, just no output, Edit Note: 'blank' is the name of a new, blank Excel book)

Unworking attempt:

for dataSet in test.columns:
    for index, rows in test.iterrows():
        geneList = test[dataSet].loc[index].split(",")
        for gene in geneList:
            blank[dataSet].reindex([gene])

So two questions: 1. How can I get this example to work? 2. How can I accomplish this transformation more efficiently?

Thanks!

julianstanley
  • 1,367
  • 4
  • 13
  • 26

1 Answers1

2

There are duplicates in genes, so need:

You need create Series by stack, then is possible use split and reshape by stack. For columns from MultiIndex call reset_index.

Last use groupby for lists and last reshape by unstack:

df = df.stack().str.split(',', expand=True).stack().reset_index(name='a')
df = df.groupby(['a', 'level_1'])['level_0'].apply(','.join).unstack()
print (df)
level_1          dataset 1          dataset 2          dataset 3
a                                                               
gene1    cell1,cell2,cell3  cell1,cell2,cell3  cell1,cell2,cell3
gene2    cell1,cell2,cell3  cell1,cell2,cell3  cell1,cell2,cell3
gene3    cell1,cell2,cell3               None               None
gene4                 None  cell1,cell2,cell3               None
gene5                 None               None  cell1,cell2,cell3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252