My dataframe consists of scores for different questions asked in a survey, over 3 fiscal years (FY13, FY14 & FY15).
The results are presented by Region
.
Here's what a sample of the actual dataframe looks like, where we have two questions per region, asked in different years.
testdf=data.frame(FY=c("FY13","FY14","FY15","FY14","FY15","FY13","FY14","FY15","FY13","FY15","FY13","FY14","FY15","FY13","FY14","FY15"),
Region=c(rep("AFRICA",5),rep("ASIA",5),rep("AMERICA",6)),
QST=c(rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",3)),
Very.Satisfied=runif(16,min = 0, max=1),
Total.Very.Satisfied=floor(runif(16,min=10,max=120)))
My Objective
For each region, my objective is to identify which question experienced the most significant upward evolution across this 3 year time frame. In order to measure significant upward movements, I have decided to use the slope of regression as a parameter.
The question with the most significant upward evolution within a region over the 3 years time frame will be the one with the steepest positive slope.
Using this logic, I have decided to do the following -
1) For each combination of Region
and QST
, I run the lm
function.
2) I extract the slope for each combination, and store it as a separate variable. Then for each region I filter out the question with the maximum slope value.
My Attempt
Here is my attempt at solving this.
test_final=testdf %>%
group_by(Region,QST) %>%
map(~lm(FY ~ Very.Satisfied, data = .)) %>%
map_df(tidy) %>%
filter(term == 'circumference') %>%
select(estimate) %>%
summarise(Value = max(estimate))
However when I run this I get an error message saying that object
FY
was not found.
Additional requirement
Also I'd like this to work only for questions that have at least 2 consecutive years of data for comparison. But I'm unable to figure out how to factor this condition into my code.
Any help with this would be greatly appreciated.