3

I'm trying to re-write a current SAS program of mine in R, and I'm checking the output to make sure it matches. I'm starting with a very basic regression, and I can't even get that to match. I also double-checked the results in Excel, and it matched the R output.

My SAS code for the regression is very basic:

Proc Reg data=[data set];
 model DepVar = Reg1 Reg2 Reg3 Reg4 Reg5 Reg6;
run;

Here's a summary of the output:

VAR         SAS         R           Excel
DepVar       0.01748     0.01748     0.01748 
Reg1        (0.24815)   (0.24809)   (0.24809)
Reg2         1.19502     1.19481     1.19481 
Reg3        (0.33029)   (0.33012)   (0.33012)
Reg4         0.80502     0.80507     0.80507 
Reg5        (1.39338)   (1.39345)   (1.39345)
Reg6        (0.13034)   (0.13051)   (0.13051)

And here's the data (only 60 data points):

OBS DepVar  Reg1    Reg2    Reg3    Reg4    Reg5    Reg6
1   -0.0444 -0.0298 -0.0165 0.0266  0.032   0.0019  -0.0035
2   -0.0491 0.0165  -0.0072 0.0283  -0.0298 -0.0165 0.0266
3   0.1208  -0.0215 -0.0138 0.0175  0.0165  -0.0072 0.0283
4   -0.0784 -0.0278 -0.04   -0.0046 -0.0215 -0.0138 0.0175
5   0.2154  0.0353  0.0299  -0.0123 -0.0278 -0.04   -0.0046
6   0.1249  0.0045  0.0256  0.0278  0.0353  0.0299  -0.0123
7   0.0062  0.0379  0.0277  -0.0045 0.0045  0.0256  0.0278
8   0.0359  -0.0127 -0.0088 0.0141  0.0379  0.0277  -0.0045
9   0.2078  0.004   -0.0068 0.0116  -0.0127 -0.0088 0.0141
10  -0.123  -0.0214 -0.0103 -0.007  0.004   -0.0068 0.0116
11  -0.0633 0.0353  0.01    -0.0185 -0.0214 -0.0103 -0.007
12  0.0173  -0.0031 -0.0051 0.0048  0.0353  0.01    -0.0185
13  -0.0204 0.03    0.0533  0.0117  -0.0031 -0.0051 0.0048
14  -0.0143 -0.0033 -0.0031 -0.0085 0.03    0.0533  0.0117
15  0.1663  0.0142  0.0356  -0.0011 -0.0033 -0.0031 -0.0085
16  -0.099  0.0066  -0.0124 0.0308  0.0142  0.0356  -0.0011
17  -0.0148 -0.0358 -0.0304 0.0277  0.0066  -0.0124 0.0308
18  -0.0807 -0.0038 -0.0054 0.0151  -0.0358 -0.0304 0.0277
19  0.1532  -0.008  -0.0399 0.0327  -0.0038 -0.0054 0.0151
20  0.1195  0.0205  0.0083  -0.0176 -0.008  -0.0399 0.0327
21  -0.0581 0.0186  -0.0123 -0.0043 0.0205  0.0083  -0.0176
22  0.0034  0.0325  0.0164  0.0048  0.0186  -0.0123 -0.0043
23  0.0476  0.0175  0.0077  0.0048  0.0325  0.0164  0.0048
24  -0.0413 0.0086  -0.0089 0.0252  0.0175  0.0077  0.0048
25  0.0192  0.0143  0.0009  -0.0002 0.0086  -0.0089 0.0252
26  0.2577  -0.0197 0.0137  0.0024  0.0143  0.0009  -0.0002
27  0.0157  0.0071  -0.0026 0.0039  -0.0197 0.0137  0.0024
28  -0.0012 0.0353  -0.0209 -0.0097 0.0071  -0.0026 0.0039
29  0.0393  0.0323  -0.0003 -0.0015 0.0353  -0.0209 -0.0097
30  -0.0036 -0.0198 0.0076  -0.0107 0.0323  -0.0003 -0.0015
31  -0.0607 -0.0374 -0.0267 -0.0299 -0.0198 0.0076  -0.0107
32  0.0236  0.0094  -0.0014 -0.0236 -0.0374 -0.0267 -0.0299
33  -0.0363 0.0314  -0.0246 -0.0213 0.0094  -0.0014 -0.0236
34  -0.0442 0.0173  0.0021  -0.0197 0.0314  -0.0246 -0.0213
35  0.0758  -0.0485 -0.0277 -0.0109 0.0173  0.0021  -0.0197
36  -0.0076 -0.0097 0.0005  -0.0003 -0.0485 -0.0277 -0.0109
37  -0.0096 -0.065  -0.0078 0.0305  -0.0097 0.0005  -0.0003
38  0.0181  -0.0332 -0.0054 -0.0003 -0.065  -0.0078 0.0305
39  -0.056  -0.0112 0.0083  0.0028  -0.0332 -0.0054 -0.0003
40  -0.0404 0.0441  -0.0149 -0.0003 -0.0112 0.0083  0.0028
41  0.2678  0.0165  0.0298  -0.0034 0.0441  -0.0149 -0.0003
42  -0.0138 -0.0865 0.0107  -0.0102 0.0165  0.0298  -0.0034
43  -0.0568 -0.01   0.0358  0.0369  -0.0865 0.0107  -0.0102
44  -0.0234 0.0129  0.0375  0.0148  -0.01   0.0358  0.0369
45  -0.141  -0.0945 -0.0034 0.044   0.0129  0.0375  0.0148
46  -0.0227 -0.1754 -0.0228 -0.0299 -0.0945 -0.0034 0.044
47  -0.1332 -0.0813 -0.0363 -0.0494 -0.1754 -0.0228 -0.0299
48  0.1535  0.015   0.0397  -0.012  -0.0813 -0.0363 -0.0494
49  0.0309  -0.0844 -0.0098 -0.0986 0.015   0.0397  -0.012
50  0.0529  -0.1042 -0.0035 -0.069  -0.0844 -0.0098 -0.0986
51  -0.0834 0.0868  0.0073  0.026   -0.1042 -0.0035 -0.069
52  0.0413  0.0986  0.054   0.0542  0.0868  0.0073  0.026
53  -0.0006 0.0486  -0.0266 0.0056  0.0986  0.054   0.0542
54  0.0159  0.0009  0.0267  -0.0244 0.0486  -0.0266 0.0056
55  -0.0506 0.0738  0.025   0.0473  0.0009  0.0267  -0.0244
56  0.05    0.0299  -0.0051 0.0759  0.0738  0.025   0.0473
57  0.009   0.0376  0.0247  0.014   0.0299  -0.0051 0.0759
58  0.0344  -0.0293 -0.0422 -0.0437 0.0376  0.0247  0.014
59  0.0038  0.0523  -0.0265 0.0017  -0.0293 -0.0422 -0.0437
60  0.1589  0.0239  0.0579  0.0073  0.0523  -0.0265 0.0017

What am I missing?

user42719
  • 41
  • 1
  • 3
  • Can you print the descriptive stats from SAS and Excel such as mean of Reg1 var? Also, you can print the regression OUTPUTs. This will help you catch trivial errors, such as reading not all observations etc. – Aksakal almost surely binary Dec 11 '14 at 22:23
  • For the record, none of these software platforms appears to be perfectly accurate. Converting the numbers *as listed in this question* to ratios of integers and computing an exact solution in *Mathematica*, and then converting its solution to floating point, gives the coefficients $0.0174833,-0.248087,1.19481,-0.330125,0.805075,-1.39345,-0.130512$. (Possibly the data presented here are rounded versions of what these programs are working with.) – whuber Dec 11 '14 at 22:24
  • Possible duplicates include http://stats.stackexchange.com/questions/38379 and http://stats.stackexchange.com/questions/113314, both of which discuss floating point imprecision in statistical output. – whuber Dec 11 '14 at 22:28

4 Answers4

5

Double check your data in SAS and make sure they have the same precision, etc. I used your data and SAS and obtained identical results as your R and Excel outputs:

enter image description here

And this is Stata output, if that helps with verifying:

enter image description here

Penguin_Knight
  • 1,289
  • 8
  • 13
  • 2
    Your observation might be related to the phenomenon discussed at http://stats.stackexchange.com/questions/113314: depending on how the data were input to the programs, they might have been working on slightly different versions! Seeing the standard errors here is very useful: it puts the tiny discrepancies in proper perspective. (+1) – whuber Dec 11 '14 at 22:30
  • @whuber thanks! If that would help, I used Window Vista and SAS 9.3. The way I input the data was through saving the data posted here as a csv (added the "," to replace the spaces) and then use proc import to read the table. – Penguin_Knight Dec 11 '14 at 22:32
  • It is nice to see that now SAS also is obtaining--to the precision shown--the *exact* results computed using rational arithmetic. – whuber Dec 11 '14 at 22:41
2

If I read your output correctly, then the differences show up in the fourth significant digit or even later - for only 60 data points. With only 60 data points, all measured to no more than two or three significant digits, you should not even look at anything beyond the third significant digit in your output. Anything "out there" will be swamped by measurement noise.

Matrix inversion (more precisely, finding solutions to linear equations) is not an exact science in floating point arithmetic. Using different numerical libraries, which may use different algorithms for solving linear equations, or even the same libraries on different architectures (which I assume is not the case for you) can certainly cause divergences on the order you are observing. Check R FAQ 7.31 for more info. Using special exact arithmetic libraries should in principle yield the same results, but I don't even know whether OLS solutions are available in R/SAS/Excel with exact arithmetic.

Stephan Kolassa
  • 7,953
  • 2
  • 28
  • 48
  • 3
    The difference in matrix inversion precision would be close to [EPS](http://en.wikipedia.org/wiki/Machine_epsilon). You'll notice it only in extreme cases such as ill-conditioned matrices, or if you are running float vs. double precision. – Aksakal almost surely binary Dec 11 '14 at 22:03
  • 2
    I downvoted because while a generally valid point, this post is misleading in this context. @Aksakal is right on this one. The differences are very large to be due to numerics. We have a full rank design matrix; QR will have no problem to go all the way down to 'eps'. I ran this problem on R and MATLAB (which use ATLAS and Intel's MKL respectively) and I got same results down to 16 digits (eg. the intercept was 0.0174833225805475 on both occasions) (Please don't be offended, your posts that I have are almost always very helpful, just this one is a bit off!) – usεr11852 Dec 12 '14 at 00:17
  • @usεr11852: thanks for explaining your downvote and for a very helpful comment! I do like to learn, and it helps if people point out where I went wrong. – Stephan Kolassa Dec 12 '14 at 08:39
1

This is the precision difference. My guess is that PROC REG is using MLE, while R and Excel are using matrix factorization route. When using linear algebra the precision is pretty much set to close to machine precision. In MLE you set the precision, then optimization routine will try to match it.

Another guess is the conversion from character to number and rounding around it.

1

Thanks everyone for your input. It appears to be something going on with the data as it works its way through the SAS program. I had originally taken a couple data sources and combined them into a single SAS dataset, and then I exported that dataset to R and Excel, which is when the differences occurred. I find now that if I do the combining of the original data sets in R and then run the regression, I get the original SAS answer. Also, I find (as someone above noted) that if I take the copied data and run that through SAS, I get the original R answer.

So the data is being changed somewhere along the line in the SAS program. However, I can't quite figure out how, since the precision of the original data is only what's shown in my original post.

Nevertheless, this is helpful. Thanks!

user42719
  • 41
  • 1
  • 3