1

In a very first attempt at creating a C++ function which can be called from R using Rcpp, I have a simple function to compute a minimum spanning tree from a distance matrix using Prim's algorithm. This function has been converted into C++ from a former version in ANSI C (which works fine).

Here it is:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
DataFrame primlm(const int n, NumericMatrix d)
{
    double const din = 9999999.e0;
    long int i1, nc, nc1;
    double dlarge, dtot;
    NumericVector is, l, lp, dist;
    l(1) = 1;
    is(1) = 1;
    for (int i=2; i <= n; i++) {
        is(i) = 0;
    }
    for (int i=2; i <= n; i++) {
        dlarge = din;
        i1 = i - 1;
        for (int j=1; j <= i1; j++) {
            for (int k=1; k <= n; k++) {
                if (l(j) == k)
                    continue;
                if (d[l(j), k] > dlarge)
                    continue;
                if (is(k) == 1)
                    continue;
                nc = k;
                nc1 = l(j);
                dlarge = d(nc1, nc);
            }
        }
        is(nc) = 1;
        l(i) = nc;
        lp(i) = nc1;
        dist(i) = dlarge;
    }
    dtot = 0.e0;
    for (int i=2; i <= n; i++){
        dtot += dist(i);
    }
    return DataFrame::create(Named("l") = l, 
                          Named("lp") = lp, 
                          Named("dist") = dist, 
                          Named("dtot") = dtot);
}

When I compile this function using Rcpp under RStudio, I get two warnings, complaining that variables 'nc' and 'nc1' have not been initialized. Frankly, I could not understand that, as it seems to me that both variables are being initialized inside the third loop. Also, why there is no similar complaint about variable 'i1'?

Perhaps it comes as no surprise that, when attempting to call this function from R, using the below code, what I get is a crash of the R system!

# Read test data
df <- read.csv("zygo.csv", header=TRUE)
lonlat <- data.frame(df$Longitude, df$Latitude)
colnames(lonlat) <- c("lon", "lat")

# Compute distance matrix using geosphere library
library(geosphere)
d <- distm(lonlat, lonlat, fun=distVincentyEllipsoid)

# Calls Prim minimum spanning tree routine via Rcpp
library(Rcpp)
sourceCpp("Prim.cpp")
n <- nrow(df)
p <- primlm(n, d) 

Here is the dataset I use for testing purposes:

"Scientific name",Locality,Longitude,Latitude Zygodontmys,Bush Bush
Forest,-61.05,10.4 Zygodontmys,Cerro Azul,-79.4333333333,9.15
Zygodontmys,Dividive,-70.6666666667,9.53333333333 Zygodontmys,Hato El
Frio,-63.1166666667,7.91666666667 Zygodontmys,Finca Vuelta
Larga,-63.1166666667,10.55 Zygodontmys,Isla
Cebaco,-81.1833333333,7.51666666667 Zygodontmys,Kayserberg
Airstrip,-56.4833333333,3.1 Zygodontmys,Limao,-60.5,3.93333333333
Zygodontmys,Montijo Bay,-81.0166666667,7.66666666667
Zygodontmys,Parcela 200,-67.4333333333,8.93333333333 Zygodontmys,Rio
Chico,-65.9666666667,10.3166666667 Zygodontmys,San Miguel
Island,-78.9333333333,8.38333333333
Zygodontmys,Tukuko,-72.8666666667,9.83333333333
Zygodontmys,Urama,-68.4,10.6166666667
Zygodontmys,Valledup,-72.9833333333,10.6166666667

Could anyone give me a hint?

maurobio
  • 1,480
  • 4
  • 27
  • 38

1 Answers1

4

The initializations of ncand nc1 are never reached if one of the three if statements is true. It might be that this is not possible with your data, but the compiler has no way knowing that.

However, this is not the reason for the crash. When I run your code I get:

Index out of bounds: [index=1; extent=0].

This comes from here:

NumericVector is, l, lp, dist;
l(1) = 1;
is(1) = 1;

When declaring a NumericVector you have to tell the required size if you want to assign values by index. In your case

NumericVector is(n), l(n), lp(n), dist(n);

might work. You have to analyze the C code carefully w.r.t. memory allocation and array boundaries.

Alternatively you could use the C code as is and use Rcpp to build a wrapper function, e.g.

#include <array>
#include <Rcpp.h>
using namespace Rcpp;

// One possibility for the function signature ...
double prim(const int n, double *d, double *l, double *lp, double *dist) {
    ....
} 

// [[Rcpp::export]]
List primlm(NumericMatrix d) {
    int n = d.nrow();
    std::array<double, n> lp;    // adjust size as needed!
    std::array<double, n> dist;  // adjust size as needed!
    double dtot = prim(n, d.begin(), l.begin(), lp.begin(), dist.begin()); 

    return List::create(Named("l") = l, 
                        Named("lp") = lp, 
                        Named("dist") = dist, 
                        Named("dtot") = dtot);
}

Notes:

  • I am returning a List instead of a DataFrame since dtot is a scalar value.
  • The above code is meant to illustrate the idea. Most likely it will not work without adjustments!
Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
  • Dear Ralf, thanks a lot for your quick and informative reply. In fact, forgetting to set the dimensions of the vectors was a primary mistake. However, just setting them as per your suggestion did not solved the problem, as I still get the same message, except that it now comes with the correct number of dimensions: "Index out of bounds: [index=15; extent=15].". I am aware of the zero-based C++ arrays, but this algorithm works fine in ANSI C, which also uses zero-based arrays. Anyway, changing the indexes did not solved the problem. For the time being, I feel quite frustrated with Rcpp.. – maurobio Jan 25 '19 at 00:31
  • @maurobio Ok. You have two possibilities: 1) Carefully analyze the C code and translate it to C++ (with or without Rcpp data structures). 2) Use Rcpp to build a wrapper function for the existing C code. See the updated answer. For well-tested and complex C code i prefer the latter approach. – Ralf Stubner Jan 25 '19 at 12:51
  • Ralf, thanks again for the insigthful comments. In fact, I would prefer to do the things by means of a wrapper which would allow the use of existing C/C++ code with as few modifications as possible. I will try your more recent suggestion. – maurobio Jan 25 '19 at 13:17
  • Yet another possibility would be to use another C++ implementation of the minimum-spanning tree algorithm which could better integrate with Rcpp; – maurobio Jan 25 '19 at 13:19
  • 1
    @maurobio Generally there should be *no* modification of the C/C++ code necessary. Rcpp is used quite often to wrap external libraries that are obviously out of control of the R(cpp) user. – Ralf Stubner Jan 25 '19 at 15:33