0

I'm trying to pivot this dataframe:

pd.DataFrame([[1, 4], [2, 5], [3, 6]], columns=['a', 'b'])

to this one:

pd.DataFrame([['a', [1, 2, 3]], ['b', [4, 5, 6]]], columns=['key', 'list'])

Ignoring the column renaming, is there a way to do it without iterating over the rows and converting them to a list and then a new column?

Sociopath
  • 13,068
  • 19
  • 47
  • 75
oshi2016
  • 875
  • 2
  • 10
  • 20
  • As per your comment on the answer: `I'm planning to merge this dataframe with another one based on the key, then sort it to find 'top 5 lists' based on some criteria`... You really should **not** want to do all this with a series of lists.. – jpp Oct 03 '18 at 12:44
  • the sorting will be based on other columns not on the series of lists one. Is that still an issue? If so I value your opinion, and happy for any suggestions for alternative solutions – oshi2016 Oct 04 '18 at 00:20

1 Answers1

1

Don't do this. Pandas was never designed to hold lists in series / columns. You can concoct expensive workarounds, but these are not recommended.

The main reason holding lists in series is not recommended is you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks. Your series will be of object dtype, which represents a sequence of pointers, much like list. You will lose benefits in terms of memory and performance, as well as access to optimized Pandas methods.

See also What are the advantages of NumPy over regular Python lists? The arguments in favour of Pandas are the same as for NumPy.

But if really need it:

df1 = pd.DataFrame({'key': df.columns, 'list':[df[x].tolist() for x in df.columns]})
print (df1)
  key       list
0   a  [1, 2, 3]
1   b  [4, 5, 6]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • It seems the reason NOT to do it is mostly from bad performance if I plan to do computations on that column? is that correct? or any computation on the rest of the dataframe will be slow too? My main reason for wanting to set the lists in cells is so I could sort the dataframe based on other columns, and then extract the lists in the order needed – oshi2016 Oct 03 '18 at 10:37
  • @oshi2016 - yes, exactly, if want working with big data, it is problem. – jezrael Oct 03 '18 at 10:38
  • @oshi2016 - so what is expected output, finally? What do you think `sort the dataframe based on other columns` ? – jezrael Oct 03 '18 at 10:40
  • The provided answer is exactly what I've needed, many thanks (will accept once the 5min limit expires). I'm planning to merge this dataframe with another one based on the key, then sort it to find 'top 5 lists' based on some criteria, and extract these lists for post processing outside of the dataframe... – oshi2016 Oct 03 '18 at 10:45