1. Clean the data
I would suggest to firstly have a cleaner data frame to work, e.g. by creating a long
dataframe making it simpler to work:
import pandas as pd
df = pd.read_csv("API_NY.GDP.DEFL.KD.ZG_DS2_en_csv_v2_3358227.csv", header=2)
print(df)
# Country Name Country Code ... 2020 Unnamed: 65
# 0 Aruba ABW ... NaN NaN
# 1 Africa Eastern and Southern AFE ... 5.510381 NaN
# 2 Afghanistan AFG ... 5.497478 NaN
# 3 Africa Western and Central AFW ... 1.591080 NaN
# 4 Angola AGO ... 15.096997 NaN
# .. ... ... ... ... ...
# 261 Kosovo XKX ... 0.975958 NaN
# 262 Yemen, Rep. YEM ... NaN NaN
# 263 South Africa ZAF ... 5.286184 NaN
# 264 Zambia ZMB ... 21.628327 NaN
# 265 Zimbabwe ZWE ... 610.000000 NaN
id_vars = ['Country Code']
value_vars = [str(_) for _ in range(1960, 2020 + 1) if str(_) in df.columns]
var_name = "year"
value_name = "rate"
long = df.melt(id_vars=id_vars, value_vars=value_vars, var_name=var_name, value_name=value_name)
long.rate = long.rate / 100 + 1
print(long)
# Country Code year rate
# 0 ABW 1960 NaN
# 1 AFE 1960 NaN
# 2 AFG 1960 NaN
# 3 AFW 1960 NaN
# 4 AGO 1960 NaN
# ... ... ... ...
# 16221 XKX 2020 1.009760
# 16222 YEM 2020 NaN
# 16223 ZAF 2020 1.052862
# 16224 ZMB 2020 1.216283
# 16225 ZWE 2020 7.100000
2. Use groups
Pandas groupb
is very useful and allows here grouping by country code:
compounded = pd.DataFrame.from_records([
{
"Country Code": code,
"Compounded Rate": 100 * group.rate.product()
} for code, group in long.groupby("Country Code")
])
print(compounded)
# Country Code Compounded Rate
# 0 ABW 2.668155e+02
# 1 AFE 1.740744e+04
# 2 AFG 2.913292e+02
# 3 AFW 2.338532e+03
# 4 AGO 5.692971e+12
# .. ... ...
# 261 XKX 1.398405e+02
# 262 YEM 5.087848e+03
# 263 ZAF 1.816078e+04
# 264 ZMB 1.138452e+07
# 265 ZWE 2.436689e+04
Note that the compounded inflation rate is calculated via group.rate.product()
by multiplying (1 + ) the yearly inflation rates for a given CC.
However, e.g. for US, the result doesn't seem the be in line with the official numbers: we have 691.071696
vs 877.79
reported.
The discrepancy can be explained by the data: e.g. the 2020/2021 inflation is 101.211289
in the dataset whereas the gov website reports 101.40
.
Then again, I might also have made a mistake somewhere...