1

I have a dataframe in pandas of the sort

df = pd.DataFrame({
    "id": [1, 1, 1, 1, 2, 2, 2, 2],
    "column": ["a", "b","a", "b", "a", "b", "a", "b"],
    "value": [1, 7, 6, 5, 4, 3, 1, 7]
})

I want to generate a new dataframe where we have

id value_a value_b
1 1 7
1 6 5
2 4 3
2 1 7

I tried many things, pivot, pivot table and so on, but all solutions seem to need index column which gets unique and the values being aggregated in some way. I want keep repeating id and have the values in the original order they appeared.

Thanks in advance!

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
Nikola
  • 620
  • 2
  • 5
  • 18

4 Answers4

2

You can try:

x = (
    df.groupby(["id", "column"])["value"]
    .agg(list)
    .unstack(level=1)
    .explode(["a", "b"])
    .add_prefix('value_')
    .reset_index()
)
x.columns.name = None
print(x)

Prints:

   id value_a value_b
0   1       1       7
1   1       6       5
2   2       4       3
3   2       1       7
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
2

Here is another way:

l = ['id','column']
(df.set_index([*l,df.groupby(l).cumcount()])['value']
.unstack(level=1)
.droplevel(1)
.rename('value_{}'.format,axis=1)
.reset_index()
.rename_axis(None,axis=1))

Output:

   id  value_a  value_b
0   1        1        7
1   1        6        5
2   2        4        3
3   2        1        7
rhug123
  • 7,893
  • 1
  • 9
  • 24
1

Concatenate dataframes filtered by column unique values:

res = pd.concat([df.loc[df.column.eq(c), ['id', 'value']]
                .set_index('id').set_axis([f'value_{c}'], axis=1)
                 for c in df.column.unique()], axis=1).reset_index()

   id  value_a  value_b
0   1        1        7
1   1        6        5
2   2        4        3
3   2        1        7
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
1

Add a column that ensures uniqueness, then pivot:

out = (df
       .assign(counter = df.groupby(['id', 'column']).cumcount())
       .pivot(index = ['id', 'counter'], columns = 'column')
       .droplevel('counter')
      ) 
out.columns = out.columns.map("_".join)
out.reset_index()
   id  value_a  value_b
0   1        1        7
1   1        6        5
2   2        4        3
3   2        1        7

sammywemmy
  • 27,093
  • 4
  • 17
  • 31