2

I am using the lmrob function in R using the robustbase library for robust regression. I would use it as, rob_reg<-lmrob(y~0+.,dat,method="MM",control=a1). When i want to return the summary i use summary(rob_reg) and one thing robust regression do is identifying outliers in the data. A certain part of the summary output give me the following,

6508 observations c(49,55,58,77,104,105,106,107,128,134,147,153,...) are outliers with |weight| <= 1.4e-06 ( < 1.6e-06);

which list all the outliers, in this case 6508 (i removed the majority and replaced it by ...). I need to somehow get these these outliers and remove them from my data. What i did before was to use summary(rob_reg)$rweights to get all the weights for the observations and remove those observations with a weight less than say a certain value in the example above the value would be 1.6e-06. I would like to know, is there a way to get a list of only the outliers without first getting the weights of all the observations?

Jason Samuels
  • 951
  • 6
  • 22
  • 40
  • 1
    The code that print the outliers for the `summary()` is actually in `summarizeRobWeights()` and it does the same thing as you. It extract's the rweights and returns those where `abs(weight) < eps`. it only seems to return the summary table and not the values themselves. – MrFlick Jun 27 '14 at 20:23
  • Robust regression is not really intended as an outlier test. It's primarily a (recommended) way to *deal* with the presence of outliers. Removing 6508 values as outliers from a dataset seems like a really bad idea. – Roland Jun 28 '14 at 07:13
  • I need to take out the outliers and run a normal regression again with `lm`. With the outliers the error terms is not normally distributed and I need to show without outliers the error terms is normally distributed. 6508 is only a small amount of my observations, as i have about 350 00 observations in total. – Jason Samuels Jun 28 '14 at 09:44
  • I take issue with Roland. Robust Regression is designed to identify outliers in a more precise way as OLS. Within OLS some outliers can be masked because of their influence on the regression coefficients (they tilt the regression trendline in their direction. So, the outliers are already mutted somewhat). Robust Regression by underweighting the outliers cause them to have a lesser impact on the regression trend line. And, thus be further away from it. So, they show up more distinctly than in OLS. – Sympa Dec 30 '15 at 18:15

1 Answers1

2

This is an old post but I recently had a need for this so I thought I'd share my solution.

    #fit the model
    fit = lmrob(y ~ x, data)
    #create a model summary
    fit.summary = summary(fit)

    #extract the outlier threshold weight from the summary 
    out.thresh = fit.summary$control$eps.outlier

    #returns the weights corresponding to the outliers
    #names(out.liers) corresponds to the index of the observation
    out.liers = fit.summary$rweights[which(fit.summary$rweights <= out.thresh)]

    #add a True/False variable for outlier to the original data by matching row.names of the original data to names of the list of outliers
    data$outlier = rep(NA, nrow(data))
    for(i in 1:nrow(data)){
      data$outlier[i] = ifelse(row.names(data[i] %in% names(out.liers), "True", "False")
    }