0

Here is the dput output of my dataset in R......

data1<-structure(list(Year = c(1998, 1999, 1999, 2000, 1996, 2001, 1998, 
1999, 2002, 1998, 2005, 1998, 1999, 1998, 1997, 1998, 2000), 
    `Firm name` = c("A", "A", "B", "B", "C", "C", "D", "D", "D", 
    "E", "E", "F", "F", "G", "G", "H", "H"), Industry = c("AUTO", 
    "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", 
    "Pharma", "Pharma", "Pharma", "Pharma", "Pharma", "Pharma", 
    "Pharma", "Pharma"), X = c(1, 2, 5, 6, 7, 9, 10, 11, 12, 
    13, 15, 16, 17, 18, 19, 20, 21), Y = c(30, 31, 34, 35, 36, 
    38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50), Z = c(23, 
    29, 47, 53, 59, 71, 77, 83, 89, 95, 107, 113, 119, 125, 131, 
    137, 143)), row.names = c(NA, -17L), class = c("tbl_df", 
"tbl", "data.frame"), na.action = structure(c(`1` = 1L), class = "omit"))
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50), Z = c(23, 
29, 35, 41, 47, 53, 59, 65, 71, 77, 83, 89, 95, 101, 107, 113, 
119, 125, 131, 137, 143)), row.names = c(NA, -21L), class = c("tbl_df", 
"tbl", "data.frame"), na.action = structure(c(`1` = 1L), class = "omit"))

Here I am trying to regress Y~ X+Z for each industry year but excluding firm i observations.For each firm I want to estimate the linear regression model using all industry peer firms' observations but excluding firm's own observations.For example;for firm A, I want to regress Y~ X+Z by using all observations of its industry peer firms (B,C & D) across time but excluding firm A observations. Similarly I want to run model for firm B by using all observations of firm A,C & D (part of same industry as B) across time excluding firm B observations. And same procedure for firm C & D as well. I want to do this exercise for every firm within each industry. Please help.

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81

2 Answers2

1

As mentioned by @bonedi you can use a nested loop to accomplish this. If you want to create models for individual industry-year combinations, you will need to subset your data by Industry and Year. You can loop over Firm name and exclude that firm before creating the model. Results can be stored in a list, named by industry-year-firm. It's not a pretty solution but it should get you closer.

lst <- list()

for (ind in unique(data1$Industry)) {
  for (year in unique(data1[data1$Industry == ind, ]$Year)) {
    for (firm in unique(data1[data1$Industry == ind & data1$Year == year, ]$`Firm name`)) {
      sub_data <- data1[data1$Industry == ind & data1$Year == year & data1$`Firm name` != firm, ]
      if (nrow(sub_data) > 0) {
        name <- paste(ind, year, firm, sep = '-')
        lst[[name]] <- lm(Y ~ X + Z, data = sub_data)
      }
    }
  }
}
Ben
  • 28,684
  • 5
  • 23
  • 45
  • There would be a nice tidyverse approach to the multiple equations. First, A loop could make n data frames with a column that stated which firm was excluded, with each data frame excluding firm i, then row bind to one data frame, and pass to the tidyverse multiple equations code. A side bonus is that the data and models would be ready for ggplot(). – Mark Neal Apr 25 '20 at 19:54
  • @Ben...Thanks for the help. I was able to work with nested loop. The only thing is my dataset is slightly heavy & using 3 nested loops is taking a lot of time to execute. So is there a way to make the above nested loop code more efficient or robust?? – Abhinav Sharma May 01 '20 at 17:51
  • Have you looked into a faster `lm`? See this post: https://stackoverflow.com/questions/25416413/is-there-a-faster-lm-function – Ben May 01 '20 at 18:04
0

The displayed code isn't nice to read. But from what you write, I'd recommend a nested loop, e.g:

for(y in year){
    for(comp in FirmName){
      # transform data : select only companys in this industry, but exclude comp
       lm(..)
     }
 }
bonedi
  • 11
  • 1