0

I'm new to this community but hope that someone will be able to help me with this issue:

I am trying to find the changes of plane efficiency data after the implementation of an intervention (nudge) in 2014. For this, I only need a single breakpoint in 2014 to be able to compare the phases before and after the implementation and then do the same with a control group. Using fixed.psi = 2014 I was able to fix the first psi, however, R identifies another breakpoint later in 2016 which should not be included. So I am trying to create a plot that shows the linear regression from 2009 through 2014 and is followed by an independent linear regression from 2014 through 2019. Here's my code so far:

# input data:
  year period nudge fcost. fconsumption         ask distance
1 2009      1     0     NA    396176200 34468768133       NA
2 2010      2     0     NA    403415300 33502639755       NA
3 2011      3     0     NA    381698000 35648670708       NA
4 2012      4     0     NA    409338200 37250324313       NA
5 2013      5     0     NA    398479550 39405973517       NA
6 2014      6     1     NA    406376750 40978703492


# get the data
setwd("/Users/username/R/Thesis")
vaa_data <- read.csv("dataset.csv", na="NA", sep=";")
vaa_data <- vaa_data[-12,]
vaa_data$efficiency <- with(vaa_data, ask/fconsumption)

# create a linear regression from 2009 to 2019 (all data)
model1 <- lm(efficiency ~ year, data=vaa_data)

# create a segmented regression with 2014 as the fixed psi/breakpoint
seg1 <- segmented(obj = model1, 
                    seg.Z = ~ year,
                    psi = 2014,
                    fixed.psi = 2014)

# get the fitted data
fitted <- fitted(seg1)
segmodel <- data.frame(Year = vaa_data$year, Efficiency = fitted)

# plot the fitted model
ggplot(segmodel, aes(x = Year, y = Efficiency)) + geom_line()


# output dput(vaa_data):
structure(list(year = 2009:2019, period = 1:11, nudge = c(0L, 
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L), fcost. = c(NA, NA, NA, 
1012000000L, 979000000L, 854700000L, 525500000L, 435200000L, 
548600000L, 697900000L, 686300000L), fconsumption = c(517301594L, 
486740423L, 502363575L, 511000000L, 498000000L, 482299406L, 460164739L, 
423060756L, 413466614L, 426236232L, 434442862L), ask = c(4.87e+10, 
4.65e+10, 4.92e+10, 5.0466e+10, 5.033e+10, 4.871e+10, 4.8385e+10, 
4.7175e+10, 4.6154e+10, 4.7747e+10, 4.8832e+10), distance = c(148440000L, 
138140000L, 145130000L, 149230000L, 154480000L, 149990000L, 150230000L, 
142910000L, 138790000L, 150840000L, 159260000L), efficiency = c(94.1423737426179, 
95.5334667159954, 97.9370369358487, 98.7592954990215, 101.064257028112, 
100.995355569648, 105.147126451164, 111.508806550707, 111.626908768987, 
112.020040567551, 112.40143243509)), row.names = c(NA, 11L), class = "data.frame")


Any suggestions on what to do? Thanks so much in advance!

This is what I got so far. The first part is perfect, but I need to 'skip' the second breaking point:

This is what I got so far. The first part is perfect, but I need to 'skip' the second breaking point.

Nico
  • 1
  • 1
  • Can you post sample data? Please edit the question with the output of `dput(vaa_data)`. Or, if it is too big with the output of `dput(head(vaa_data, 20))`. – Rui Barradas Nov 06 '22 at 18:16
  • Hi Rui, thanks for your help. I just updated the question with your required information. – Nico Nov 10 '22 at 09:59

0 Answers0