Use repeat
with loc
if detault RangeIndex
:
print (df.index.repeat(df['Col 2']))
Int64Index([0, 0, 0, 1, 1], dtype='int64')
df = df.loc[df.index.repeat(df['Col 2'])].reset_index(drop=True)
print (df)
Col 1 Col 2
0 Adam 3
1 Adam 3
2 Adam 3
3 Sarah 2
4 Sarah 2
And then:
df.to_csv(file, index=False)
General solution for duplicated Index or DatetimeIndex
is repeat numpy array created by numpy.arange
and selecting by positions by iloc
:
df = df.iloc[np.arange(len(df)).repeat(df['Col 2'])].reset_index(drop=True)
EDIT:
Solution without np.repeat
:
df =df.loc[[c for a, b in zip(df.index, df['Col 2']) for c in [a] * b]].reset_index(drop=True)
print (df)
Col 1 Col 2
0 Adam 3
1 Adam 3
2 Adam 3
3 Sarah 2
4 Sarah 2