1

I am trying to fit a model having as predictor the variables TNST and Seff and as response the variable AUCMET. The result of the fitting is:

    mdl1 = 


Linear regression model:
    AUCMET ~ 1 + TNST + Seff

Estimated Coefficients:
                   Estimate    SE         tStat      pValue    
    (Intercept)     1251.5      72.176      17.34    1.4406e-58
    TNST           -2.3058     0.16045    -14.371    1.9579e-42
    Seff            13.087      1.0748     12.176    9.4907e-32


Number of observations: 932, Error degrees of freedom: 929
Root Mean Squared Error: 322
R-squared: 0.197,  Adjusted R-Squared 0.195
F-statistic vs. constant model: 114, p-value = 5.36e-45

enter image description here

The result from the anova analisis is

anova(mdl1)

ans = 

             SumSq         DF     MeanSq        F         pValue    
    TNST     2.1395e+07      1    2.1395e+07    206.52    1.9579e-42
    Seff     1.5359e+07      1    1.5359e+07    148.25    9.4907e-32
    Error    9.6243e+07    929     1.036e+05  

The output of the diagnostic plot is

plotDiagnostics(mdl)

enter image description here Could you help me to interpret this result? I see that all the p are < 0.05 so they variables are important for the model. Is it a good model? what should I look at to understand it?

gabboshow
  • 5,359
  • 12
  • 48
  • 98
  • Sewer my answer below. But if you post either your data or a graph someone might be able to suggest a better model. – user1543042 Aug 10 '15 at 14:24

2 Answers2

1

The r-squared / adjusted r-squared are the Pearson correlation coefficient. https://en.m.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

A 1 is good a 0 is bad so I'd say it's a poetry bad model.

user1543042
  • 3,422
  • 1
  • 17
  • 31
  • Thanks for your answer! Can I say that the model explains the 20% of the variance of the AUCMET variable? What other kind of model could I apply? – gabboshow Aug 10 '15 at 15:20
1

Edit: Now that you edited the question with new information:

1- From the plot diagnostic test it can be seen that there are a percentage of points with high leverage. But this plot does not reveal whether the high-leverage points are outliers. Try plotDiagnostics(mdl,'cookd') to find the outliers (points with large Cook's distance) and remove them from the data.

2- The ANOVA table shows that both variables are important and you cannot consider removing them.


Is a Low R-squared Bad?

No. In fields such as predicting human behavior (e.g. psychology), R-squared values are low because the human's behavior are hard to predict. Also, if the obtained R-squared is low but the prediction is good, the model counts as a good model. So a low R-squared doesn't necessarily affect the interpretation of significant variables. How high should the R-squared be for prediction? Well, that depends on your requirements for the width of a prediction interval and how much variability is present in your data. While a high R-squared is required for precise predictions, it’s not sufficient by itself, as we shall see. On the other hand, High R-squared Values are not Inherently Good. A high R-squared does not necessarily indicate that the model has a good fit. (read more)

What to do next?

To examine the quality of the model you can perform other tests, such as

  1. ANOVA

To examine the quality of the fitted model, consult an ANOVA table.

tbl = anova(mdl)
  1. Diagnostic plots

Diagnostic plots help you identify outliers, and see other problems in your model or fit.

plotDiagnostics(mdl)
  1. Residuals

There are several residual plots to help you discover errors, outliers, or correlations in the model or data. The simplest residual plots are the default histogram plot, which shows the range of the residuals and their frequencies, and the probability plot, which shows how the distribution of the residuals compares to a normal distribution with matched variance.

plotResiduals(mdl)
  1. And more
NKN
  • 6,482
  • 6
  • 36
  • 55