2

I have a data table that looks like this:

Original Table

enter image description here

Importing this into pandas using df = pd.read_excel('H:/test/test.xlsx', header=[0,1,2]) gives this dataframe:

    Day                 Cats                   Dogs              
    Unnamed: 0_level_1  Small       Big        Little     Large    
    Unnamed: 0_level_2  young  old  young old  young old  young old
0   1                   1      2    1     5    3     2    8     6
1   2                   2      4    3     6    0     1    4     6
2   3                   3      6    6     7    9     3    5     0

I want to convert the dataframe into something that looks like this:

    Day Count  Animal   Size    Age
0   1   1      Cats     Small   young
1   2   2      Cats     Small   young
2   3   3      Cats     Small   young
3   1   2      Cats     Small   old
4   2   4      Cats     Small   old
5   3   6      Cats     Small   old
6   1   1      Cats     Big     young
7   2   3      Cats     Big     young
8   3   6      Cats     Big     young
9   1   5      Cats     Big     old
10  2   6      Cats     Big     old
11  3   7      Cats     Big     old
12  1   3      Dogs     Little  young
13  2   0      Dogs     Little  young
14  3   9      Dogs     Little  young
15  1   2      Dogs     Little  old
16  2   1      Dogs     Little  old
17  3   3      Dogs     Little  old
18  1   8      Dogs     Large   young
19  2   4      Dogs     Large   young
20  3   5      Dogs     Large   young
21  1   6      Dogs     Large   old
22  2   6      Dogs     Large   old
23  3   0      Dogs     Large   old

I tried this method to achieve the result:

df.columns = ['_'.join(col).strip() for col in df.columns.values]
df.rename(columns={'Day_Unnamed: 0_level_1_Unnamed: 0_level_2':'Day'}, inplace=True)
df = df.melt(id_vars=['Day'], var_name='INFO', value_name='Count')
df['INFO'] = df['INFO'].str.split('_')
df['Animal'] = df['INFO'].str[0]
df['Size'] = df['INFO'].str[1]
df['Age'] = df['INFO'].str[2]
df.drop(columns='INFO',inplace=True)

What I've done is I've flattened my multi-level headers into a plain dataframe, and used meltto unpivot the dataframe. Then I separated my df[INFO] column into the specific columns 'Animal','Size','Age'.

This method works, but I want to know if there are any ways to achieve this without having to flatten the table into a plain format and splitting the 'INFO' column because the dataset I want to work with have far more columns than just three.

SecretAgentMan
  • 2,856
  • 7
  • 21
  • 41
jski212002
  • 83
  • 1
  • 7

1 Answers1

1

Use melt:

out = (
  df.set_index(df.columns[0])
    .melt(var_name=['Animal', 'Size', 'Age'], value_name='Count', ignore_index=False)
    .rename_axis('Day')
    .reset_index()
)

Output:

>>> out
    Day Animal    Size    Age  Count
0     1   Cats   Small  young      1
1     2   Cats   Small  young      2
2     3   Cats   Small  young      3
3     1   Cats   Small    old      2
4     2   Cats   Small    old      4
5     3   Cats   Small    old      6
6     1   Cats     Big  young      1
7     2   Cats     Big  young      3
8     3   Cats     Big  young      6
9     1   Cats     Big    old      5
10    2   Cats     Big    old      6
11    3   Cats     Big    old      7
12    1   Dogs  Little  young      3
13    2   Dogs  Little  young      0
14    3   Dogs  Little  young      9
15    1   Dogs  Little    old      2
16    2   Dogs  Little    old      1
17    3   Dogs  Little    old      3
18    1   Dogs   Large  young      8
19    2   Dogs   Large  young      4
20    3   Dogs   Large  young      5
21    1   Dogs   Large    old      6
22    2   Dogs   Large    old      6
23    3   Dogs   Large    old      0
Corralien
  • 109,409
  • 8
  • 28
  • 52