0

I have this df in pandas:


df = { "id" : ["apple", "potato","lemon"],"som" : [4, 2, 7] , "value" : [1, 2, 4]}
df = pd.DataFrame(df)

I would like to repeat each row based on value. Desired output is:

results_df = {"id" : ["apple", "potato", "potato", "lemon", "lemon", "lemon", "lemon"] , "som" : [4, 2, 2, 7, 7, 7, 7]}

results_df = pd.DataFrame(results_df)

How can I do this please?

vojtam
  • 1,157
  • 9
  • 34
  • As per the suggested duplicate, try `df.reindex(df.index.repeat(df['value'])).reset_index(drop=True).drop('value', axis=1)`. – ouroboros1 Aug 22 '23 at 10:57

1 Answers1

1

You can use numpy.repeat and pandas.DataFrame.loc:

import numpy as np
import pandas as pd

d = { "id": ["apple", "potato", "lemon"], "som": [4, 2, 7], "value": [1, 2, 4]}
df = pd.DataFrame(data=d)
results_df = df.loc[np.repeat(df.index.values, df["value"])]
results_df = results_df.drop(columns=["value"]).reset_index(drop=True)
results_df

Output:

index id som
0 apple 4
1 potato 2
2 potato 2
3 lemon 7
4 lemon 7
5 lemon 7
6 lemon 7
Sash Sinha
  • 18,743
  • 3
  • 23
  • 40