2

I would like to estimate a spdep::lagsarlm Model (Spatially Autoregressive Regression) in R. My observations (n=447) are polygons, each representing an administrative region of Berlin.

However, the problem is that the regions have a highly varying number of inhabitants (between 500 and 32000). Therefore, I would like to weight each observation with its number of inhabitants. With lm this is easypeasy, because it accepts the optional argument weights=...

How can I do something similar with spdep::lagsarlm? Is there a workaround?

rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
chamaoskurumi
  • 2,271
  • 2
  • 23
  • 30
  • What happened to your answer? I thought it was a good addition. – LyzandeR Mar 05 '15 at 20:12
  • Maybe think heteroscastic errors, and look at package `sphet` and the two corresponding JSS papers? – Edzer Pebesma Mar 06 '15 at 11:26
  • @LyzandeR: I deleted it because the author did not give me permission to post it on Stack. – chamaoskurumi Mar 09 '15 at 10:19
  • @Gui_struggling_with_R That's a pity... I am sure it might help someone. Anyway, it's his choice.. Thanks for the reply. – LyzandeR Mar 09 '15 at 10:29
  • According to Roger Bivand, this has not been developed yet (as of 2015). See his response here. http://r-sig-geo.2731867.n2.nabble.com/Weighting-observations-in-a-spdep-lagsarlm-Model-td7587871.html – rafa.pereira Dec 29 '17 at 12:05

1 Answers1

4

I haven't used spdep::lagsarlm but it is very easy to replicate the way lm uses weights using the following method:

Let's assume you have a data.frame df defined as:

df <- data.frame(a=runif(10), b=runif(10))

> df
           a          b
1  0.8266429 0.43591733
2  0.4624063 0.93180891
3  0.7085656 0.36468984
4  0.3339251 0.79093356
5  0.8236406 0.39687242
6  0.8266429 0.83213817
7  0.4624063 0.34714824
8  0.7085656 0.01812133
9  0.3339251 0.54498829
10 0.8236406 0.73677156

and a weights vector defined as:

c(1,1,1,1,2,2,2,2,2,2)

Running an lm on the above data produces the following results:

> lm(a~b, data=df, weights=c(1,1,1,1,2,2,2,2,2,2))

Call:
lm(formula = a ~ b, data = df, weights = c(1, 1, 1, 1, 2, 2, 
    2, 2, 2, 2))

Coefficients:
(Intercept)            b  
     0.6672      -0.0467  

Let's see now how the function lm actually uses the weights vector.

We start by replicating the rows of the data.frame df by the number defined in the weights like this:

replicate_rows <- rep(1:nrow(df), c(1,1,1,1,2,2,2,2,2,2))

Rows with a weight of 2 appear twice as you can see below:

> replicate_rows
 [1]  1  2  3  4  5  5  6  6  7  7  8  8  9  9 10 10

Use the above to make a new data.frame df2 that uses those rows:

df2 <- df[replicate_rows, ]

> df2
             a          b
1    0.8266429 0.43591733
2    0.4624063 0.93180891
3    0.7085656 0.36468984
4    0.3339251 0.79093356
5    0.8236406 0.39687242
5.1  0.8236406 0.39687242
6    0.8266429 0.83213817
6.1  0.8266429 0.83213817
7    0.4624063 0.34714824
7.1  0.4624063 0.34714824
8    0.7085656 0.01812133
8.1  0.7085656 0.01812133
9    0.3339251 0.54498829
9.1  0.3339251 0.54498829
10   0.8236406 0.73677156
10.1 0.8236406 0.73677156

I have replicated the rows of the dataframe df according to the weights. Let's run an lm now without the use of weights:

> lm(a~b, data=df2)

Call:
lm(formula = a ~ b, data = df2)

Coefficients:
(Intercept)            b  
     0.6672      -0.0467  

As you can see the results are exactly the same!

You can use the above to weigh your data.frame accordingly and then use it in your spdep::lagsarlm function.

LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • Do you have any paper backing this up (or am I doomed to go through the literature cited in `?lm`? – Roman Luštrik Mar 05 '15 at 13:41
  • @RomanLuštrik No I don't have any paper backing this up unfortunately. The literature in `lm` doesn't specify anything else either. But this is the way the weights argument is used in `lm` as shown above. – LyzandeR Mar 05 '15 at 13:56
  • It makes sense, in an intuitive way, just like outlayers have "weights" to them, the same way you can put some influence on existing data points by duplicating/triplicating... them. Thanks for the insight. – Roman Luštrik Mar 05 '15 at 14:00
  • 2
    @RomanLuštrik -- There's also a sentence to that effect in `?lm`, which might give you even more confidence that that's what `weights` is about. Here's a snippet from it: "'weights' can be used to indicate, [when the elements of 'weights' are positive integers w_i], that each response y_i is the mean of w_i unit-weight observations (including the case that there are w_i observations equal to y_i and the data have been summarized)". Pretty nifty, isn't it? – Josh O'Brien Mar 05 '15 at 14:07
  • @JoshO'Brien Thanks for the comment. That makes it clearer. I ll update if I find anything else as well. I remember reading about it somewhere too (not in a paper though). – LyzandeR Mar 05 '15 at 14:10
  • In `lm` it shouldn't because that is how the weights are implemented anyway. I haven't used `spdep::lagsarlm` but it shouldn't affect them in a negative way since you only alter the number of rows according to the weights. And thanks :) – LyzandeR Mar 05 '15 at 15:46
  • What works for basic linear regression shouldn't be expected to necessarily work here. I would imagine that creating duplicates of rows (or locations in this case) would screw up the matrix of weights that the SAR regression creates. – tphilli Feb 25 '21 at 09:57