reshaping the dataset in python

Question

I have this dataset:

Account	lookup	FY11USD	FY12USD	FY11local	FY12local
Sales	CA	1000	5000	800	4800
Sales	JP	5000	6500	10	15

Trying to arrive to get the data in this format: (below example has 2 years of data but no. of years can vary)

Account	lookup	Year	USD	Local
Sales	CA	FY11	1000	800
Sales	CA	FY12	5000	4800
Sales	JP	FY11	5000	10
Sales	JP	FY12	6500	15

I tried using the below script, but it doesn't segregate USD and local for the same year. How should I go about that?

df.melt(id_vars=["Account", "lookup"], 
    var_name="Year", 
    value_name="Value")

score 1 · Answer 1 · answered May 13 '22 at 02:05

You can piece it together like so:

dfn = (pd.concat( 
[df[["Account", "lookup", 'FY11USD','FY12USD']].melt(id_vars=["Account", "lookup"], var_name="Year", value_name="USD"),
df[["Account", "lookup", 'FY11local','FY12local']].melt(id_vars=["Account", "lookup"], var_name="Year", value_name="Local")[['Local']]], axis=1 ))
dfn['Year'] = dfn['Year'].str[:4]

Output

  Account lookup  Year   USD  Local
0   Sales     CA  FY11  1000    800
1   Sales     JP  FY11  5000     10
2   Sales     CA  FY12  5000   4800
3   Sales     JP  FY12  6500     15

sammywemmy · Accepted Answer · 2022-05-13T03:04:01.860

One efficient option is to transform to long form with pivot_longer from pyjanitor, using the .value placeholder ---> the .value determines which parts of the columns remain as headers:

# pip install pyjanitor
import pandas as pd
import janitor

df.pivot_longer(
     index = ['Account', 'lookup'], 
     names_to = ('Year', '.value'), 
     names_pattern = r"(FY\d+)(.+)")

  Account lookup  Year   USD  local
0   Sales     CA  FY11  1000    800
1   Sales     JP  FY11  5000     10
2   Sales     CA  FY12  5000   4800
3   Sales     JP  FY12  6500     15

Another option is to use stack:

temp = df.set_index(['Account', 'lookup'])
temp.columns = temp.columns.str.split('(FY\d+)', expand = True).droplevel(0)
temp.columns.names = ['Year', None]
temp.stack('Year').reset_index()

  Account lookup  Year   USD  local
0   Sales     CA  FY11  1000    800
1   Sales     CA  FY12  5000   4800
2   Sales     JP  FY11  5000     10
3   Sales     JP  FY12  6500     15

You can also pull it off with pd.wide_to_long after reshaping the columns:

index = ['Account', 'lookup']
temp = df.set_index(index)
temp.columns = (temp
                .columns
                .str.split('(FY\d+)')
                .str[::-1]
                .str.join('')
               )
(pd.wide_to_long(
      temp.reset_index(), 
      stubnames = ['USD', 'local'], 
      i = index, 
      j = 'Year', 
      suffix = '.+')
.reset_index()
)

  Account lookup  Year   USD  local
0   Sales     CA  FY11  1000    800
1   Sales     CA  FY12  5000   4800
2   Sales     JP  FY11  5000     10
3   Sales     JP  FY12  6500     15

reshaping the dataset in python

2 Answers2