4

I have a function Q(x|delta): R^n --> R for which I would like to fit a nonlinear quantile regression. The function Q(.) uses some matrix operations and it would be very complicated to do not use it. The problem is that it seems that the nlrq (nonlinear quantile regression) and the nls (non linear regression) do not work when there is matrix operations in the function used in the formula argument.

To illustrate, consider the simpler function F(x1,x2|a,b,c) which I can use in the formula argument for the nlrq and the nls functions when I do not use matrix operations, but which do not work in the formula argument when it was written with matrix operations.

    library('quantreg')

    ## Generating the data
    x1<- rnorm(200)
    x2<- rnorm(200)
    y<- 1+3*sin(x1)+2*cos(x2) +rnorm(200)
    Dat<- data.frame(y,x1,x2)

    ## The function F1 without matrix operation
    F1<- function(x_1, x_2, a, b,c){a+b*sin(x_1)+c*cos(x_2)}

    ## The function F2 with matrix operation
    F2<- function(x_1, x_2, a, b,c){t(c(1,sin(x_1),cos(x_2)))%*%c(a,b,c)}

    ## Both functions work perfectly
    F1(x_1=3, x_2=2, a=1, b=3,c=2)
    F2(x_1=3, x_2=2, a=1, b=3,c=2)

    ## But only F1 can be estimated by nls and nlrq
    nls_1<-nls(y ~ F1(x_1 = x1, x_2 = x2, a = 1, b, c),
               data = Dat, start = list(b = 3, c = 2))

    nlrq_1<-nlrq(y ~ F1(x_1 = x1, x_2 = x2, a = 1, b, c),
                 data = Dat, start = list(b = 3, c = 2), tau = 0.9)

    ## When F2 is used in the formula argument an error happens
    nls_2<-nls(y ~ F2(x_1 = x1, x_2 = x2, a = 1, b, c),
               data = Dat, start = list(b = 3, c = 2))

    nlrq_2<-nlrq(y ~ F2(x_1 = x1, x_2 = x2, a = 1, b, c),
                 data = Dat, start = list(b = 3, c = 2), tau = 0.9)

The error is Error in t(c(1, sin(x_1), cos(x_2))) %*% c(a, b, c) : non-conformable arguments. I believe that if someone manages to estimate F2 using matrix operations through nls and nlrq, I would be able to use the same solution in my other function.

The size of Dat is 200x3.

Thank you very much.

jogo
  • 12,469
  • 11
  • 37
  • 42
ThiagoSC
  • 69
  • 1
  • 9
  • Sorry, I've forgotten to mention that to use nlrq one needs the quantreg package then add `install.packages('quantreg')` and `library('quantreg')` to the code. – ThiagoSC Mar 15 '17 at 15:11

3 Answers3

3

Your function F2() doesn't work for vector arguments x_1x_2, ... because c(...) is constructing only a long vector (not a matrix).
See:

F1(x_1=c(3,5), x_2=c(2,4), a=1, b=3,c=2)
F2(x_1=c(3,5), x_2=c(2,4), a=1, b=3,c=2)

result:

#> F1(x_1=c(3,5), x_2=c(2,4), a=1, b=3,c=2)
#[1]  0.5910664 -3.1840601
#> F2(x_1=c(3,5), x_2=c(2,4), a=1, b=3,c=2)
#error in t(c(1, sin(x_1), cos(x_2))) %*% c(a, b, c) :  ...

The function nls() and nlrq() are sending vectors (i.e. columns from your dataframe Dat) to your functions F2() (resp. F1()).

Here are some vectorized definitions of F2():

# other definitions for F2()
F2 <- function(x_1, x_2, a, b,c) cbind(1,sin(x_1),cos(x_2)) %*% c(a,b,c)
F2(x_1=c(3,5), x_2=c(2,4), a=1, b=3,c=2)

F2 <- function(x_1, x_2, a, b,c) t(rbind(1,sin(x_1),cos(x_2))) %*% c(a,b,c)
F2(x_1=c(3,5), x_2=c(2,4), a=1, b=3,c=2)

F2 <- function(x_1, x_2, a, b,c) colSums(rbind(1,sin(x_1),cos(x_2)) * c(a,b,c))
F2(x_1=c(3,5), x_2=c(2,4), a=1, b=3,c=2)

F2 <- function(x_1, x_2, a, b,c) crossprod(rbind(1,sin(x_1),cos(x_2)), c(a,b,c))
F2(x_1=c(3,5), x_2=c(2,4), a=1, b=3,c=2)

F2 <- function(x_1, x_2, a, b,c) tcrossprod(c(a,b,c), cbind(1,sin(x_1),cos(x_2)))
F2(x_1=c(3,5), x_2=c(2,4), a=1, b=3,c=2)
jogo
  • 12,469
  • 11
  • 37
  • 42
  • I didn' t expect the function to work with vector arguments, but you gave me an idea of building the vectors inside of F2 with rbind instead of c and as a consequence the estimations worked! Thank you very much. – ThiagoSC Mar 15 '17 at 18:15
  • 1
    When I came back to my original problem I understood why you pointed out that it is a problem the function do not accept vectors as arguments. The nls and the nlrq functions use the whole vector of explanatory variables as arguments in the formula. – ThiagoSC Mar 16 '17 at 18:23
1

You can fall back to a general-purpose optimisation function for this. The usual default in R is optim, but there are many others.

Here's the case for the least-squares regression. The loss function is the sum of squared residuals. I've rewritten your F2 function so that it works for vector arguments.

sumsq <- function(beta)
{
    F2 <- function(x1, x2, a, b, c)
    {
        cbind(1, sin(x1), cos(x2)) %*% c(a, b, c)
    }
    yhat <- F2(Dat$x1, Dat$x2, beta[1], beta[2], beta[3])
    sum((Dat$y - yhat)^2)
}

beta0 <- c(mean(Dat$y), 1, 1)

optim(beta0, sumsq, method="BFGS")

#initial  value 731.387431 
#final  value 220.265745 
#converged
#$par
#[1] 0.8879371 3.0211286 2.1639280
# 
#$value
#[1] 220.2657
#
#$counts
#function gradient 
#      25        7 
#
#$convergence
#[1] 0
#
#$message
#NULL

Here, optim returns a list with a number of components. The component par is the value of the regression coefficients that minimises the sum of square residuals, which is in the component value.

If you compare with the result of nls, you can see that the estimated coefficients are approximately equal.

nls(y ~ F1(x_1=x1, x_2=x2, a=1, b, c),
           data=Dat, start=list(b=3, c=2))

Nonlinear regression model
  model: y ~ F1(x_1 = x1, x_2 = x2, a = 1, b, c)
   data: Dat
    b     c 
3.026 2.041 
 residual sum-of-squares: 221

Number of iterations to convergence: 1 
Achieved convergence tolerance: 7.823e-10

You can do something similar for quantile regression, but that would be more complicated.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
1

Based on the other answers I have figured out that the problem is building the vector inside F2 with a the c() function. When I used rbind() instead, the estimation worked perfectly both with nls() and nlrq().

Next I show the corrected version of F2.

    ## Changing c() for rbind()
    F2<- function(x_1, x_2, a, b,c){t(rbind(1,sin(x_1),cos(x_2)))%*%rbind(a,b,c)}

    ## Now nls() and nlrq() work properly
    nls_2<-nls(y ~ F2(x_1 = x1, x_2 = x2, a = 1, b, c),
       data = Dat, start = list(b = 3, c = 2))

    nlrq_2<-nlrq(y ~ F2(x_1 = x1, x_2 = x2, a = 1, b, c),
         data = Dat, start = list(b = 3, c = 2), tau = 0.9)

Notice that the estimations in nls_2 and nlrq_2 coincide with the ones from nls_1 and nlrq_1.

Thank you very much for your help.

ThiagoSC
  • 69
  • 1
  • 9