I have a scatterplot in R with time on the x axis and cost on the y axis. I want to find a constant line (y=?) that will minimize the sum of variances from all these points to the constant line. The data isn't too important (example: mtcars data), but if you would like to reference something you can use the code below.
#mtcars
plot(mtcars$wt, mtcars$disp)
i=1
j=1
sum_df <- data.frame()
for(i in as.integer(min(mtcars$disp)):as.integer(max(mtcars$disp))){
sum_var = list()
for(j in 1:length(mtcars$disp)){
sum_var[[j]] <- abs(i-mtcars$disp[j])
}
sum_var = do.call(rbind, sum_var)
sum_var <- sum(sum_var[,1])
new_sum <- rbind(sum_var,sum_df)
sum_df <- new_sum
}
row.names(sum_df)=as.integer(min(mtcars$disp)):as.integer(max(mtcars$disp))
sum_df$best_line <- ifelse(min(sum_df[,1])==sum_df[,1], "Best Line", "")
colnames(sum_df) <- c("Disp", "Abs Sum of Var")
I know I could loop through different constant lines and find the sum of variances for each and then decide which line fits best. However, I have a lot of data points and I am already looping through a lot of graphs. Is there a better way to code this besides the brute force method?