I am analysing a data set race_df
in R to do with races and their times. Each row has a categorical variable race_df$raceID
(so I will use a factor
variable to simulate an indicator variable for each race) and has associated race_df$lap_time
. I want to analyse the data set through linear regression and then apply shrinkage methods such as LASSO or RIDGE.
For shrinkage methods I need to standardise the data, but since for different races the average lap-time will differ (due to length of the track). When standardising the column race_df$lap_time
surely I would standardise based on the average lap time and standard deviation for each race. Not just the whole column.