0

I have a pandas dataframe that looks something like this:

df=pd.DataFrame({'a':['A','B','C','A'], 'b':[1,4,1,3], 'c':[0,6,1,0], 'd':[1,0,0,5]})

I want a dataframe that will look like this:

enter image description here

The original dataframe was grouped by values in column 'a' and its corresponding values are saved as a dictionary in a new column 'dict'. The key - value pairs are the column name and values in the column respectively. In case if a value in column 'a' has multiple entries (for eg A in column 'a' occurs twice), then a list of dictionary should be created for the same value.

How can I do this ?(Please ignore the grammatical mistakes and please ask any doubts regarding the question if it sounded too vague)

RemyM
  • 168
  • 1
  • 8

1 Answers1

2

Don't do this. Pandas was never designed to hold list/tuples/dicts in series / columns. You can concoct expensive workarounds, but these are not recommended.

The main reason holding lists in series is not recommended is you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks. Your series will be of object dtype, which represents a sequence of pointers, much like list. You will lose benefits in terms of memory and performance, as well as access to optimized Pandas methods.

See also What are the advantages of NumPy over regular Python lists? The arguments in favour of Pandas are the same as for NumPy.

But if really need it:

df = df.groupby('a').apply(lambda x: x.to_dict('r')).reset_index(name='dict')
print (df)
   a                                               dict
0  A  [{'a': 'A', 'b': 1, 'c': 0, 'd': 1}, {'a': 'A'...
1  B               [{'a': 'B', 'b': 4, 'c': 6, 'd': 0}]
2  C               [{'a': 'C', 'b': 1, 'c': 1, 'd': 0}]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    I suspect the "if really need it" is a class XY problem ;) – jpp Oct 03 '18 at 09:05
  • @jezrael thanks ! this worked.Since you said pandas are not designed to hold list/tuples/dicts in columns, can you tell me how this can be done in numpy array? – RemyM Oct 03 '18 at 09:53
  • @RemyM - pandas is designed for working with scalars, numpy too. – jezrael Oct 03 '18 at 10:05