Reshaping a Dataframe with repeating column names

Question

I have data that looks like this:

    dataframe_1:                
    week    SITE        LAL SITE     LAL
0   1   BARTON CHAPEL   1.1 PENASCAL I  1
1   2   BARTON CHAPEL   1.1 PENASCAL I  1
2   3   BARTON CHAPEL   1.1 PENASCAL I  1

And, i need the final dataframe to look like this:

    dataframe_2:        
    week    SITE    LAL
0   1   BARTON CHAPEL   1.1
1   2   BARTON CHAPEL   1.1
2   3   BARTON CHAPEL   1.1
3   1   PENASCAL I  1
4   2   PENASCAL I  1
5   3   PENASCAL I  1

I've tried using 'melt' but I cannot get the desire result. Perhaps I'm using the wrong approach? thank you,

How did you arrive to `dataframe_1`? Seems there's probably a better way to approach it so that you don't have to "melt" it — Gene Burinsky, Jun 01 '23 at 15:04

score 3 · Answer 1 · answered Jun 01 '23 at 14:59

3

If you had unambiguous column names:

   week           SITE  LAL      SITE.1  LAL.1
0     1  BARTON CHAPEL  1.1  PENASCAL I      1
1     2  BARTON CHAPEL  1.1  PENASCAL I      1
2     3  BARTON CHAPEL  1.1  PENASCAL I      1

You could use pandas.lreshape:

pd.lreshape(df, {'SITE': ['SITE', 'SITE.1'], 'LAL': ['LAL', 'LAL.1']})

Output:

   week           SITE  LAL
0     1  BARTON CHAPEL  1.1
1     2  BARTON CHAPEL  1.1
2     3  BARTON CHAPEL  1.1
3     1     PENASCAL I  1.0
4     2     PENASCAL I  1.0
5     3     PENASCAL I  1.0

With your duplicated columns, you can melt + pivot:

out = (df
   .melt('week').assign(idx=lambda d: d.groupby(['week', 'variable']).cumcount())
   .pivot(index=['idx', 'week'], columns='variable', values='value')
   .reset_index('week').rename_axis(index=None, columns=None)
)

Output:

   week  LAL           SITE
0     1  1.1  BARTON CHAPEL
0     2  1.1  BARTON CHAPEL
0     3  1.1  BARTON CHAPEL
1     1    1     PENASCAL I
1     2    1     PENASCAL I
1     3    1     PENASCAL I

answered Jun 01 '23 at 14:59

mozway

194,879
13
39
75

Hi, mozway, do you have any idea why `df.groupby(level=0, axis=1).agg(list)` throws error? – Ynjxsjmh Jun 01 '23 at 15:11
1

@Ynjxsjmh you need to use `.agg(lambda x: x.to_numpy().tolist())`. Else this applies `list` on the column names (try `.agg(''.join)`) ;) – mozway Jun 01 '23 at 17:11
1

@Ynjxsjmh your above question has now been asked (and answered) [here](https://stackoverflow.com/a/76809952/16343464) ;) – mozway Aug 01 '23 at 08:55
1

Excellent answer, thank you for remembering this :). – Ynjxsjmh Aug 04 '23 at 13:18
1

@Ynjxsjmh I have to say I struggled a bit to find your comment (the nickname didn't help :p) – mozway Aug 04 '23 at 13:20

Learning is a mess · Accepted Answer · 2023-06-01T15:06:41.107

1

Not a very generalizable solution, but will work on your example:

df.groupby('week').apply( lambda _df : pd.concat((_df.iloc[:,1:3], _df.iloc[:,3:5]))).reset_index('week')

it groups by week and then reshapes with column selection + concatenation. Removing a superfluous index column in the end.

edited Jun 01 '23 at 15:06

answered Jun 01 '23 at 14:59

Learning is a mess

7,479
7
35
71

score 0 · Answer 3 · answered Jun 01 '23 at 15:00

You can do this by extracting from the dataframe_1 the columns you want to extract and concatenate at the bottom:

# create a dummy dataframe with 6 columns and 10 rows
df = pd.DataFrame(np.random.randint(0,100,size=(10, 6)), columns=list('ABCDEF'))

df_tempo = df[['D','E','F']]

Then renaming the columns of the temporary dataframe with the same name in your case ['week', 'SITE', 'LAL']:

df_tempo.columns = ['A','B','C']

And then concatenate the temporary with the subset of the first dataframe you want to keep:

df = pd.concat([df[['A','B','C']], df_tempo], axis=0, ignore_index=True)

Hope it helps!

Reshaping a Dataframe with repeating column names

3 Answers3

Linked