I am trying to melt a dataframe based on a series of columns which are connected by a time series. The year 2000 y1 is equivalent to the year 2000 y0, 2000 y2 = 2002 y0 = 2001 y1 etc.
Consider the raw data
import pandas as pd
import numpy as np
# Raw Data
years = ["2000", "2001", "2002", "2003"]
num_combos = len(years)
products = ["A"] * num_combos
bools = [True, False, True, False]
bools1 = [False, True, False, False]
bools2 = [True, False, False, True]
bools3 = [False, False, True, False]
values = [100, 97, 80, 74]
cols = {"years": years,
"products": products,
"y0": bools,
"y1": bools1,
"y2": bools2,
"y3": bools3,
"value": values}
df = pd.DataFrame(cols)
The expected outcome of the data should be the following dataset. Note that this is only for one date but would hold true for each additional year. Additionally, there are multiple columns in the true dataset which should follow the same pattern as the demo "Result" column in df1 below.
# Expected Outcome
res_years = years * 4
products1 = ["A"] * 4
new_values = [100, np.nan, 80, np.nan]
years1 = ["2000"] * 4
time = [0, 1, 2, 3]
cols1 = {"years": years1,
"products": products1,
"time": time,
"y": bools,
"Result": new_values}
df1 = pd.DataFrame(cols1)
This is the current attempt using pd.melt. Is there a way of introducing this dependency between the "melted" columns and the "result" column?
# Current Melt
id_vars = ["years", "products", "value"]
value_variables = ["y0", "y1", "y2", "y3"]
var_name = "time"
var = "Result"
melted = df.melt(id_vars=id_vars,
value_vars=(*value_variables,),
value_name=var,
var_name=var_name)
melted["time"] = melted['time'].str.replace("y", '').astype(int)
melted = melted.sort_values(by=["years", "time"]).reset_index(drop=True)