2

I am writing a C version of my R mcmc code. Part of the code that is throwing up errors is as follows:

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <stddef.h>
#include <limits.h>

#include <gsl/gsl_machine.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_cdf.h>

#include <R.h>
#include <Rmath.h>
#include <Rembedded.h>
#include <Rdefines.h>
#include <R_ext/BLAS.h>
#include <R_ext/Lapack.h>
#include <R_ext/Linpack.h>
//#include <vecLib/cblas.h>      /* C BLAS in APPLE */
#include "blaio.h"           /*Linear Algebra I/O using http://www.mitchr.me/SS/exampleCode/blas.html*/

double rTruncNorm (double, double*, double, double);

// C version of group lasso
void BinaryBayesianGroupLasso (double *X, int *dim, int *y, int *iterations, double *bprev, double *Lprev) {

    // X is design matrix with intercept
    //dim is dimensions of X
    int iter = 1, i;
    int n = dim[0];
    int p = dim[1];
    double sigma = 1, z[n], mu[n], u, a, b;

    // random number generation set up
    const double lowest_double = -GSL_DBL_MAX;
    const double highest_double = GSL_DBL_MAX;
    const gsl_rng *gBaseRand;       /* global rand number generator */
    unsigned long randSeed;    

    /* specifying to use Mersenne twister as the PRNG */
    gBaseRand = gsl_rng_alloc(gsl_rng_mt19937);
    srand(time(NULL));                    /* initialization for rand() */
    randSeed = rand();                    /* returns a non-negative integer */
    gsl_rng_set(gBaseRand, randSeed);    /* seed the PRNG */

    while(iter < iterations){

        cblas_dgemv(CblasRowMajor, CblasNoTrans, n, p, 1, X, p, bprev, 1, 0, mu, 1);
        printVector(n, mu, 8, 3, NULL, NULL, NULL, "       mu"); // width=8, precision=3

        for (i = 0; i < n; i++) {
            u = gsl_ran_flat(gBaseRand, 0, 1);
            if(y[i]==1){
                a = 0.0 - mu[i];
                b = highest_double;
            } else {
                a = lowest_double;
                b = 0.0 - mu[i];
            }
            z[i] = mu[i] + rTruncNorm(sigma, &u, a, b);            
        }

        printVector(n, z, 8, 3, NULL, NULL, NULL, "       mu"); // width=8, precision=3

    }
    gsl_rng_free(gBaseRand);
}



// gaussian truncated normal between [a,b], params - mu=0, sigma, unifrnd = u
double rTruncNorm(double sigma, double *u, double a, double b){

    double unifr = *u;
    double Fa = gsl_cdf_gaussian_P(a, sigma);
    double Fb = gsl_cdf_gaussian_P(b, sigma);
    double v = Fa + (Fb-Fa)*unifr;
    double x = gsl_cdf_gaussian_Pinv(v, sigma);
    return(x);
}

The functions in blaio.h have already been built using the makefile in the website, I got them from.

I am using the following to compile:

R CMD SHLIB -lgsl -lgslcblas BinaryBayesianGroupLasso.c

It is throwing the following errors:

gcc -arch x86_64 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/x86_64 -DNDEBUG  -I/usr/local/include    -fPIC  -g -O2  -c BinaryBayesianGroupLasso.c -o BinaryBayesianGroupLasso.o
BinaryBayesianGroupLasso.c: In function ‘BinaryBayesianGroupLassoC’:
BinaryBayesianGroupLasso.c:36: error: nested functions are disabled, use -fnested-functions to re-enable
BinaryBayesianGroupLasso.c:36: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘gBaseRand’
BinaryBayesianGroupLasso.c:37: warning: implicit declaration of function ‘time’
BinaryBayesianGroupLasso.c:41: warning: comparison between pointer and integer
BinaryBayesianGroupLasso.c:50: warning: implicit declaration of function ‘cblas_dgemv’
BinaryBayesianGroupLasso.c:50: error: ‘CblasRowMajor’ undeclared (first use in this function)
BinaryBayesianGroupLasso.c:50: error: (Each undeclared identifier is reported only once
BinaryBayesianGroupLasso.c:50: error: for each function it appears in.)
BinaryBayesianGroupLasso.c:50: error: ‘CblasNoTrans’ undeclared (first use in this function)
BinaryBayesianGroupLasso.c:50: error: ‘mu’ undeclared (first use in this function)
BinaryBayesianGroupLasso.c:51: warning: implicit declaration of function ‘printVector’
BinaryBayesianGroupLasso.c:70: error: ‘i’ undeclared (first use in this function)
BinaryBayesianGroupLasso.c:71: error: ‘u’ undeclared (first use in this function)
BinaryBayesianGroupLasso.c:72: error: ‘y’ undeclared (first use in this function)
BinaryBayesianGroupLasso.c:73: error: ‘a’ undeclared (first use in this function)
BinaryBayesianGroupLasso.c:74: error: ‘b’ undeclared (first use in this function)
BinaryBayesianGroupLasso.c:79: warning: implicit declaration of function ‘rTruncNorm’
BinaryBayesianGroupLasso.c:91: warning: passing argument 1 of ‘gsl_rng_free’ discards qualifiers from pointer target type
BinaryBayesianGroupLasso.c:92: warning: ‘return’ with a value, in function returning void
BinaryBayesianGroupLasso.c: At top level:
BinaryBayesianGroupLasso.c:98: error: conflicting types for ‘rTruncNorm’
BinaryBayesianGroupLasso.c:79: error: previous implicit declaration of ‘rTruncNorm’ was here
make: *** [BinaryBayesianGroupLasso.o] Error 1

I have no nested function in C. Why is it throwing up such an error? If it is a linking problem, should it not throw up function not found kind of an error?

Edit: I was asked to post the solution. In my initial code, I was not compiling the above code (but an older version in another directory). (Yup, that is the stupidest error possible). I am reposting a working version of the code. I do not think it would be of much value, other than seeing an example of how mcmc sampling steps are coded up, and how its called from R. This code is however incomplete in terms of Grouped Bayesian Lasso. It also has codes (with sources attributed for printing matrices used in blas). I have started coding in C in just about a week, so take everything (except the print codes) with a pinch of salt. If you have much a prettier version in mind (like putting separate files for printing, and the compiling command), go ahead and make changes.

I have learned 3 things in this exercise:

  • calling C from R

  • using blas

  • using gsl

The code will evolve to address a 4th point later: - calling R from C

I have to do a sampling from generalized inverse gaussian in this model (excluded in this code). However I do not see an easy piece of function that does that in C. I might have to use R's function then. So it becomes weird. R calls this C function, and within this C function a few other R functions will be called. Your comments will be appreciated, if there are any alternate approaches, and if this is the way, how to do it - RInside? anything easier?

Time is of prime importance to me. I have to run this model with about a million data points, with at least 10,000 mcmc iterations. I want to finish this in less than 5 minutes, preferably 2 minutes.

A third stage of this exercise is to parallelize the sampling step in z.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <stddef.h>
#include <limits.h>
#include <time.h> 
#include <ctype.h>              /* Char classes    ISOC  */
#include <string.h>             /* Strings         ISOC  */


#include <gsl/gsl_machine.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_cdf.h>
#include <gsl/gsl_cblas.h>

#include <R.h>
#include <Rmath.h>
#include <Rembedded.h>
#include <Rdefines.h>
//#include <R_ext/BLAS.h>
#include <R_ext/Lapack.h>
#include <R_ext/Linpack.h>
//#include <vecLib/cblas.h>      /* C BLAS          APPLE */
//#include "blaio.h"             /* Basic Linear Algebra I/O */


// source for the printing codes: //source: http://www.mitchr.me/SS/exampleCode/blas.html

#define REALT double

#ifndef REALT
#define REALT double
#endif

void sgeprt(int m, int n, REALT, char *c);
void dgeprt(int m, int n, REALT, char *c);

/* ****************************** - external functions from blaio.h.  */

void printVector(int n, REALT  *v,        /* Size and array                 */
                 int wide, int prec,      /* Width and precesion for floats */
                 char *pad,               /* Right pad string               */
                 char *ldel, char *rdel,  /* Left and right delimiter       */
                 char *tag                /* Tag for first line             */
  );

void printMatrix(const enum CBLAS_ORDER order, 
                 int n, int m, REALT  *a,  /* Size and array                 */
                 int wide, int prec,       /* Width and precesion for floats */
                 char *pad,                /* Right pad string               */
                 char *ldel, char *rdel,   /* Left and right delimiter       */
                 char *lidel, char *ridel, /* Left and right INNER delimiter */
                 char *tag                 /* Tag for first line             */
  );

int readMatrix(int *n, int *m,   /* Will contain size of the array after the read */
               REALT  *a,        /* Will point to the data */
               int maxEle,       /* Maximum number of elements to read */
               char *fileName    /* The file name to read the data from */
  );

void printMatrixThr(const enum CBLAS_ORDER order, 
                   int n, int m, REALT  *a,    /* Size and array                 */
                   char *inStr, char *outStr,  /* "in" string, and "out" string  */
                   REALT  minIn, REALT  maxIn, /* Min/Max values for "in" range. */
                   char *pad,                  /* Right pad string               */
                   char *ldel, char *rdel,     /* Left and right delimiter       */
                   char *lidel, char *ridel,   /* Left and right INNER delimiter */
                   char *tag                   /* Tag for first line             */
  );


double rTruncNorm (double, double*, double, double);

// C version of group lasso
void BinaryBayesianGroupLassoC (double *X, int *dim, int *y, int *iterations, double *bprev, double *Lprev) {  

    // X is design matrix with intercept
    //dim is dimensions of X
    int iter = 1, i;
    int n = dim[0];
    int p = dim[1];
    double sigma = 1, z[n], mu[n], u, a, b, *Li;

    // random number generation set up
    const double lowest_double = -GSL_DBL_MAX;
    const double highest_double = GSL_DBL_MAX;
    const gsl_rng *gBaseRand;       /* global rand number generator */
    unsigned long randSeed;         

    /* specifying to use Mersenne twister as the uniform PRNG */
    gBaseRand = gsl_rng_alloc(gsl_rng_mt19937);
    srand(time(NULL));                    /* initialization for rand() */
    randSeed = rand();                    /* returns a non-negative integer */
    gsl_rng_set(gBaseRand, randSeed);     /* seed the PRNG */

    while(iter < *iterations){

        cblas_dgemv(CblasRowMajor, CblasNoTrans, n, p, 1, X, p, bprev, 1, 0, mu, 1);
        printVector(n, mu, 8, 3, NULL, NULL, NULL, "       mu"); // width=8, precision=3

        for (i = 0; i < n; i++) {
            u = gsl_ran_flat(gBaseRand, 0, 1);
            if(y[i]==1){
                a = 0.0 - mu[i];
                b = highest_double;
            } else {
                a = lowest_double;
                b = 0.0 - mu[i];
            }
            z[i] = mu[i] + rTruncNorm(sigma, &u, a, b);            
        }

        printVector(n, z, 8, 3, NULL, NULL, NULL, "       z"); // width=8, precision=3                  
        iter = iter + 1;

    }
    gsl_rng_free(gBaseRand);    
}



// gaussian truncated normal between [a,b], params - mu=0, sigma, unifrnd = u
double rTruncNorm(double sigma, double *u, double a, double b){

    double unifr = *u;
    double Fa = gsl_cdf_gaussian_P(a, sigma);
    double Fb = gsl_cdf_gaussian_P(b, sigma);
    double v = Fa + (Fb-Fa)*unifr;
    double x = gsl_cdf_gaussian_Pinv(v, sigma);
    return(x);
}

/* function definitions from blaio.h *******/

void printMatrixUbr(const enum CBLAS_ORDER order, /* CBLAS row order                */
                    int n, int m, REALT  *a,      /* Size and array                 */
                    char *inStr, char *outStr,    /* "in" string, and "out" string  */
                    REALT  minIn, REALT  maxIn,   /* Min/Max values for "in" range. */
                    int wide, int prec,           /* Width and precesion for floats */
                    int *rowPerm, int *colPerm,
                    char prtMode, /* b=bitmap, V=values, *=in/out) */
                    char *fileName, 
                    int maskMode, // (L-below diag, U=above diag, D-Diagnal, M-Mask, 0=NONE, \0-NONE
                    char *mask,
                    char *pad,                    /* Right pad string               */
                    char *ldel, char *rdel,       /* Left and right delimiter       */
                    char *lidel, char *ridel,     /* Left and right INNER delimiter */
                    char *tag                     /* Tag for first line             */
  );

/* ************************************************************************** */

void printVector(int n, REALT  *v, int wide, int prec, char *pad, char *ldel, char *rdel, char *tag) {
  printMatrixUbr(CblasRowMajor, 1, n, v, NULL, NULL, 0.0, 0.0, wide, prec, NULL, NULL, 'V', NULL, '0', NULL, pad, ldel, rdel, "", "", tag);
} /* end func printVector */

/* ************************************************************************** */
void printMatrix(const enum CBLAS_ORDER order, int n, int m, REALT  *a, int wide, int prec, char *pad, char *ldel, char *rdel, char *lidel, char *ridel, char *tag) {
  printMatrixUbr(order, n, m, a, NULL, NULL, 0.0, 0.0, wide, prec, NULL, NULL, 'V', NULL, '0', NULL, pad, ldel, rdel, lidel, ridel, tag);
} /* end func printMatrix */

/* ************************************************************************** */
void printMatrixThr(const enum CBLAS_ORDER order, int n, int m, REALT  *a, char *inStr, char *outStr, REALT  minIn, REALT  maxIn, char *pad, char *ldel, char *rdel, char *lidel, char *ridel, char *tag) {
  printMatrixUbr(order, n, m, a, inStr, outStr, minIn, maxIn, 0, 0, NULL, NULL, '*', NULL, '0', NULL, pad, ldel, rdel, lidel, ridel, tag);
} /* end func printMatrixThr*/

/* ************************************************************************** */
void printMatrixUbr(const enum CBLAS_ORDER order, /* CBLAS row order                  */
                    int n, int m, REALT  *a,      /* Size and array                   */
                    char *inStr, char *outStr,    /* "in" string, and "out" string    */
                    REALT  minIn, REALT  maxIn,   /* Min/Max values for "in" range.   */
                    int wide, int prec,           /* Width and precesion for floats   */
                    int *rowPerm, int *colPerm,   /* Permute rows i->xx[i]            */
                    char prtMode,                 /* b=bitmap, V=values, *=in/out     */
                    char *fileName,               /* if NULL, stdout.                 */
                    int maskMode,                 /* L, U, D, M-Mask, 0=NONE          */
                    char *mask,                   /* Mask (same size as a) ctrl print */
                    char *pad,                    /* Right pad string                 */
                    char *ldel, char *rdel,       /* Left and right delimiter         */
                    char *lidel, char *ridel,     /* Left and right INNER delimiter   */
                    char *tag                     /* Tag for first line               */
  ) {
  int i, j, iP, jP;
  int k, ldelLen, tagLen, cIdx, prtPerMask;
  REALT  pVal;

  if(inStr == NULL)  inStr  = "*";
  if(outStr == NULL) outStr = " ";
  if(wide < 0)       wide   = 5;
  if(prec < 0)       prec   = 2;
  if(ldel == NULL)   ldel   = "[";
  if(ridel == NULL)  ridel  = "]";
  if(lidel == NULL)  lidel  = "[";
  if(rdel == NULL)   rdel   = "]";
  if(pad  == NULL)   pad    = " ";
  if(tag  == NULL)   tag    = "";

  ldelLen = strlen(ldel);
  tagLen = strlen(tag);
  for(j=0; j<n; j++) {
    if(j==0) 
      printf("%s%s%s%s", tag, ldel, lidel, pad);
    else {
      for(k=0;k<tagLen;k++) printf(" ");
      for(k=0;k<ldelLen;k++) printf(" ");
      printf("%s%s", lidel, pad);
    } /* end if/else */
    for(i=0; i<m; i++) {
      if(colPerm != NULL)
        iP = colPerm[i];
      else
        iP = i;
      if(rowPerm != NULL)
        jP = rowPerm[j];
      else
        jP = j;
      if(order == CblasColMajor)
        cIdx = n*iP+jP;
      else
        cIdx = m*jP+iP;
      pVal = a[cIdx];
      // Figure out what the mask has to do with printing..
      if(maskMode == '0')
        prtPerMask = 1;
      else if(maskMode == 'L')
        prtPerMask = (iP<jP);  // Row order specific!  Fix this.
      else if(maskMode == 'D')
        prtPerMask = (iP==jP);
      else if(maskMode == 'U')
        prtPerMask = (iP>jP);  // Row order specific!  Fix this.
      else if(maskMode == 'M')
        prtPerMask = mask[cIdx];
      else
        prtPerMask = 1;

      if(prtMode == '*') {
        if( prtPerMask && (pVal >= minIn) && (pVal <= maxIn) )
          printf("%s%s", inStr, pad);
        else 
          printf("%s%s", outStr, pad);
      } else {
        if(prtPerMask)
          printf("%*.*f%s", wide, prec, pVal, pad);
        else
          printf("%*s%s", wide, outStr, pad);
      } /* end if/else */
    } /* end for */
    if(j==n-1)
      printf("%s%s\n", ridel, rdel);
    else
      printf("%s\n", ridel);
  } /* end for */
} /* end func printMatrixUbr*/

/* ************************************************************************** */
int readMatrix(int *n, int *m, REALT  *a, int maxEle, char *fileName) {
  int i, j;
  int inComment;
  int numNumbers;
  int ch;
  int lengthOfNumber;
  char numberBuffer[256];
  FILE *FP;

  FP = fopen(fileName, "r");
  if(FP == NULL) 
    return 1;
  i = j = 0;
  inComment = 0;
  lengthOfNumber = -1;
  numNumbers = 0;
  while(EOF != (ch = getc(FP))) {
    if(ch == '#') inComment = 1;               /* Enter comment upon # char. */
    if(inComment && (ch < 20)) inComment = 0;  /* Break out of comment upon ANY control char. */
    if( !(inComment)) {
      if(isdigit(ch) || (ch=='.') || (ch=='E') || (ch=='e') || (ch=='-') | (ch=='+')) {
        lengthOfNumber++;
        numberBuffer[lengthOfNumber] = ch;
      } else {
        if(lengthOfNumber>=0) {
          numberBuffer[lengthOfNumber+1] = '\0';
          lengthOfNumber = -1;
          numNumbers++;
          if(numNumbers==1) {
            *n = atoi(numberBuffer);
          } else if(numNumbers==2) {
            *m = atoi(numberBuffer);
          } else if (numNumbers<maxEle) {
            a[numNumbers-3] = atof(numberBuffer);
          } else {
            return(-1);
          } /* end if/else */
        } /* end if */
      } /* end if/else */
    } /* end if */
  } /* end while */
  fclose(FP);
  return(numNumbers-2);
} /* end func main */

Simulating random data in R and calling C function:

# simulate data
n = 1000000; p=30; iterations = 1000; burnin = 5;
X = as.data.frame(sapply(1:p, function(x) rnorm(n, 0, 1)))
Y = sample(c(0,1), n, replace=TRUE)



call_BinaryBayesianGroupLasso <- function(X,Y,iterations,burnin){

  # X is design matrix without intercept
  p = dim(X)[2]+1 # number of predictors including dummy vars and intercept
  n = dim(X)[1]
  z = rep(0,n)
  h = 1000 # hyperparameter for shrinkage (controls number of parameters in model)
  b = matrix(0, nrow=p, ncol=iterations) # store beta
  L = matrix(0, nrow=p, ncol=iterations) # store lambda in
  initmodel = glm(Y ~ ., data = X, family = binomial(link = "probit")) # full model initial estimates 

  # add intercept to design matrix
  X = cbind(rep(1,n), X) 
  colnames(X) = names(initmodel$coefficients)
  b[,1] = summary(initmodel)$coefficients[,1]
  L[,1] = summary(initmodel)$coefficients[,2]
  eye = diag(rep(1,n)) # used in mcmc step for b

  bprev=b[,1]
  Lprev=L[,1]

  # call C function for group lasso
  res = .C("BinaryBayesianGroupLassoC",
           as.double(as.matrix(X)),            
           as.integer(dim(X)),
           as.integer(Y), 
           as.integer(iterations),
           as.double(bprev),
           as.double(Lprev))

}
user1971988
  • 845
  • 7
  • 22
  • 1
    `while(iter < iterations)` should be: `while(iter < *iterations){` because `iterations` is a pointer and `iter` is int – Grijesh Chauhan Aug 24 '13 at 09:41
  • Some of those errors are pretty self-explanatory. e.g. `error: ‘i’ undeclared (first use in this function)`, you need `for( int i =0;...)` - i.e. declare the variable type. – Simon O'Hanlon Aug 24 '13 at 09:53
  • Sorry guys. I edited the post. I was accessing the wrong directory. Things have compiled with a slight hitch which I will update in the post soon. Thanks a lot guys. Girjesh - Yes that was an error. fixed it. Much appreciated. Simon - i was declared upstairs. I was worried about how I was linking libraries when building. Being new, the nested function error earlier completely threw me off. Thanks for your help. – user1971988 Aug 24 '13 at 09:59
  • Verbose - Thanks, but I found the error. – user1971988 Aug 24 '13 at 10:04
  • 3
    @user1971988 don't edit to remove the initial error. Now the question makes no sense. Rollback and add an answer to show how you solved it. Thanks. – Simon O'Hanlon Aug 24 '13 at 10:30
  • Done! Let me know of any comments/approaches. – user1971988 Aug 24 '13 at 18:54
  • @user1971988 good work adding your solution, but you should add it as an *answer*, not an edit to the question! It is perfectly acceptable to answer your own question - who knows, you may even get upvotes for it. Use the answer submission box at the bottom of the page. :-) – Simon O'Hanlon Aug 28 '13 at 21:05
  • Ah, well. Next time maybe. :) Once I have a complete working solution of this whole thing. I am already trying to figure out the 4th point in my "grand C + R + parallelization" endeavor: http://stackoverflow.com/questions/18497837/running-embedded-r-in-c – user1971988 Aug 28 '13 at 21:16

0 Answers0