I'm new to this community but hope that someone will be able to help me with this issue:
I am trying to find the changes of plane efficiency data after the implementation of an intervention (nudge) in 2014. For this, I only need a single breakpoint in 2014 to be able to compare the phases before and after the implementation and then do the same with a control group.
Using fixed.psi = 2014
I was able to fix the first psi, however, R identifies another breakpoint later in 2016 which should not be included. So I am trying to create a plot that shows the linear regression from 2009 through 2014 and is followed by an independent linear regression from 2014 through 2019.
Here's my code so far:
# input data:
year period nudge fcost. fconsumption ask distance
1 2009 1 0 NA 396176200 34468768133 NA
2 2010 2 0 NA 403415300 33502639755 NA
3 2011 3 0 NA 381698000 35648670708 NA
4 2012 4 0 NA 409338200 37250324313 NA
5 2013 5 0 NA 398479550 39405973517 NA
6 2014 6 1 NA 406376750 40978703492
# get the data
setwd("/Users/username/R/Thesis")
vaa_data <- read.csv("dataset.csv", na="NA", sep=";")
vaa_data <- vaa_data[-12,]
vaa_data$efficiency <- with(vaa_data, ask/fconsumption)
# create a linear regression from 2009 to 2019 (all data)
model1 <- lm(efficiency ~ year, data=vaa_data)
# create a segmented regression with 2014 as the fixed psi/breakpoint
seg1 <- segmented(obj = model1,
seg.Z = ~ year,
psi = 2014,
fixed.psi = 2014)
# get the fitted data
fitted <- fitted(seg1)
segmodel <- data.frame(Year = vaa_data$year, Efficiency = fitted)
# plot the fitted model
ggplot(segmodel, aes(x = Year, y = Efficiency)) + geom_line()
# output dput(vaa_data):
structure(list(year = 2009:2019, period = 1:11, nudge = c(0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L), fcost. = c(NA, NA, NA,
1012000000L, 979000000L, 854700000L, 525500000L, 435200000L,
548600000L, 697900000L, 686300000L), fconsumption = c(517301594L,
486740423L, 502363575L, 511000000L, 498000000L, 482299406L, 460164739L,
423060756L, 413466614L, 426236232L, 434442862L), ask = c(4.87e+10,
4.65e+10, 4.92e+10, 5.0466e+10, 5.033e+10, 4.871e+10, 4.8385e+10,
4.7175e+10, 4.6154e+10, 4.7747e+10, 4.8832e+10), distance = c(148440000L,
138140000L, 145130000L, 149230000L, 154480000L, 149990000L, 150230000L,
142910000L, 138790000L, 150840000L, 159260000L), efficiency = c(94.1423737426179,
95.5334667159954, 97.9370369358487, 98.7592954990215, 101.064257028112,
100.995355569648, 105.147126451164, 111.508806550707, 111.626908768987,
112.020040567551, 112.40143243509)), row.names = c(NA, 11L), class = "data.frame")
Any suggestions on what to do? Thanks so much in advance!
This is what I got so far. The first part is perfect, but I need to 'skip' the second breaking point: