I'm have an excel document formatted like so (Columns are datasets, Rows are cell types, values are comma-delineated gene names)
I would like to reformat the sheet like so (Columns are still datasets, but Rows are now gene names, and values are cell types):
I was trying to do this in pandas. I imported the input as a dataFrame and called it 'test'. My logic was to loop over each column, and within loop over each row, take the comma-delineated values, split those, and then make each of those a new index.
This approach is obviously pretty inefficient, but I'm not even able to get it to work as expected yet (even though I'm not getting an error, just no output, Edit Note: 'blank' is the name of a new, blank Excel book)
Unworking attempt:
for dataSet in test.columns:
for index, rows in test.iterrows():
geneList = test[dataSet].loc[index].split(",")
for gene in geneList:
blank[dataSet].reindex([gene])
So two questions: 1. How can I get this example to work? 2. How can I accomplish this transformation more efficiently?
Thanks!