I'm trying to optimize spdep function of R for my use case since it is very slow for large databases. I was doing mostly fine but I stuck at one point, where I am trying to find trace of my weights matrix for LM error test. I think the formula is tr[(W' + W) W] (page 82 of Anselin, L., Bera, A. K., Florax, R. and Yoon, M. J. 1996 Simple diagnostic tests for spatial dependence. Regional Science and Urban Economics, 26, 77–104.) W is a square weights matrix, holding the spatial relation of each observation with another. tr() operation is the sum of the diagonals.
In my case, the weights matrix is symmetric and the diagonals are zero. So, I thought the formula tr[(W' + W) W] equals to 2*sumsq(W), which is super fast. But apparently I am mistaken somewhere because the results do not match the results of the spdep library, which is likely to be right.
The relevant part of the spdep library is here. Can anybody help me how the result of the following function differs from 2*sumsq(W) or how to make it much faster? This function is where the lm.LMtests function gets clogged for large data sets.
tracew <- function (listw) {
dlmtr <- 0
n <- length(listw$neighbours)
if (n < 1) stop("non-positive n")
ndij <- card(listw$neighbours)
dlmtr <- 0
for (i in 1:n) {
dij <- listw$neighbours[[i]]
wdij <- listw$weights[[i]]
for (j in seq(length=ndij[i])) {
k <- dij[j]
# Luc Anselin 2006-11-11 problem with asymmetric listw
dk <- which(listw$neighbours[[k]] == i)
if (length(dk) > 0L && dk > 0L &&
dk <= length(listw$neighbours[[k]]))
wdk <- listw$weights[[k]][dk]
else wdk <- 0
dlmtr <- dlmtr + (wdij[j]*wdij[j]) + (wdij[j]*wdk)
}
}
dlmtr
}
Additional explanation for those who are not familiar with spdep library of R: The input of the function, listw, holds a "graph" implementation of the weight matrix with two list of lists. listw$neighbors is a list, where each list item is a list of the indices of observations for which the observation has a relation to. listw$weights a list of the same structure with neighbors, except that it holds the weights of the relation.
Thanks in advance for any comments and directions.
# example code
# initiliaze
library(spdep)
library(multiway)
# load the tracew function above
data(columbus)
columbus = columbus[rep(row.names(columbus), 20), ] # the difference becomes dramatic when n is high. try not replicating at first to see the results.
# manual calculation, using sumsq
w = distm(cbind(columbus$X, columbus$Y))
w[w > 1000000] = Inf # remove some relations acc. to pre-defined rule
w = 1/(1+w)
diag(w) = 0
w = w / (sum(w) / length(columbus$X)) #"C style" standardization
2*sumsq(w)
# spdep calculation
neighs.band = dnearneigh(cbind(columbus$X, columbus$Y), 0, 1000, longlat = TRUE)
w.spdep = lapply(nbdists(neighs.band, cbind(columbus$X, columbus$Y), longlat = TRUE), function(x) 1/(0.001+x))
my.listw = nb2listw(neighs.band, glist = w.spdep, style="C")
tracew(my.listw)