Transform a table of two columns (attributes and description) to multiple columns where each column is an attribute: using PANDAS

Question

I currently have a table like this:

ATRIBUTE    DESCRIPTION
CODIGO        A1
TITULO        A2
AUTOR         A3
SUMARIO       A4
CODIGO        B1
TITULO        B2
AUTOR         B3
SUMARIO       B4
EXTENSION     B5
CODIGO        C1
AUTOR         C3
SUMARIO       C4
EXTENSION     C5
NOTAS         C6
OTROS         C7
 ...          ...

and I need to get a table like this:

CODIGO  TITULO  AUTOR   SUMARIO EXTENSION   NOTAS   OTROS
  A1      A2     A3      A4       NAN        NAN     NAN
  B1      B2     B3      B4       B5         NAN     NAN
  C1      NAN    C3      C4       C5         C6      C7
  ...     ...    ...     ...      ...        ...     ...

I´ve been trying pivoting or melting, but I can´t find a way to get the result I need.

Does anybody can give me a suggestion?

Thanks,

score 2 · Answer 1 · answered Jun 30 '23 at 01:01

Try:

df['tmp'] = df['ATRIBUTE'].eq('CODIGO').cumsum()

out = df.pivot(index='tmp', columns='ATRIBUTE', values='DESCRIPTION')
out.columns.name, out.index.name = None, None
print(out)

Prints:

  AUTOR CODIGO EXTENSION NOTAS OTROS SUMARIO TITULO
1    A3     A1       NaN   NaN   NaN      A4     A2
2    B3     B1        B5   NaN   NaN      B4     B2
3    C3     C1        C5    C6    C7      C4    NaN

score 0 · Answer 2 · answered Jun 30 '23 at 03:41

Here is a way:

(df.set_index(['ATRIBUTE',df.groupby(df['ATRIBUTE']).cumcount()])['DESCRIPTION']
.unstack(level=0)
.reindex(df['ATRIBUTE'].unique(),axis=1))

Output:

ATRIBUTE CODIGO TITULO AUTOR SUMARIO EXTENSION NOTAS OTROS
0            A1     A2    A3      A4        B5    C6    C7
1            B1     B2    B3      B4        C5   NaN   NaN
2            C1    NaN    C3      C4       NaN   NaN   NaN

Transform a table of two columns (attributes and description) to multiple columns where each column is an attribute: using PANDAS

2 Answers2