0

I am using spatstat to estimate the risk of pest introduction and spread from roads, highways, and other roadways. However, I believe I am running into memory-limitation issues; my data is at a continental scale and my computer only has 16 GB of memory. The warning message I receive when running spatstat's as.owin() and density.psp() functions is:

Error: cannot allocate vector of size X.X. Gb

Some colleagues of mine have suggested I might be able to lessen the memory-burden by converting the spatstat functions as.owin() and density.psp() to execute via C++ with the rcpp package. This technique is well outside of my comfort zone and I was hoping to get a sense from StackOverflow on whether or not it's even feasible before I dedicate many hours to it.

Specifically, my questions are:

  1. Has anyone converted spatstat functions to C++?
  2. How have other spatstat users worked around memory-limitation issues?

Any help and guidance would be greatly appreciated.

Many thanks,

Josh

Josh Persi
  • 83
  • 1
  • 7
  • 2
    Your colleagues are obviously not familiar with `spatstat`. The `density.psp` function runs to 140 lines of code, and calls heavily on other functions within the spatstat package family, many of which _are_ written and run in C code (which will likely be quicker than any C++ code you wrote yourself). Effectively, you would need to rewrite a large chunk of spatstat in C++ to do this, (thousands of lines, months of debugging, even if you were highly skilled in C++ and spatial stats) and it would be very unlikely to give any performance gain. – Allan Cameron Mar 29 '22 at 22:08
  • 1
    I'm not sure what the answer to your problem is, but trying to rewrite spatstat functions in C++ isn't it - most of the heavy computation is already run in C. You may need to chunk your data into regions to run your analysis. – Allan Cameron Mar 29 '22 at 22:10
  • Thanks @AllanCameron, your comments are very helpful and much appreciated. – Josh Persi Mar 29 '22 at 23:08
  • 2
    The usual quick-fix when computational power is a limitation is to throw more resources at the problem: (1) buy hardware (e.g.. more memory), (2) borrow more powerful machines (e.g. from another team), or (3) rent a virtual machine in the cloud. – user51187286016 Mar 30 '22 at 00:12

1 Answers1

6

Firstly I strongly agree with the poster who said that the quick fix is not to edit the code but to throw more computing resources at the existing code. It can be fiddly to use a cloud computing service, but it will take much more time to re-implement, test, and validate a completely new source code.

But anyway:

The first thing to check is whether the pixel images that you want to create are too big to store in R memory at all. Try just creating Z <- as.im(R, dimyx=d) where R is a rectangle containing your spatial domain and d is the dimensions (rows, columns) of the desired image. If that fails with a message about memory limits, then you're going to need a bigger boat -- I mean, computer.

The function density.psp has options method="FFT" (the default) and method="C". Have you tried both of these? The FFT method uses more memory because it does the whole calculation in one enormous Fourier transform (after expanding the domain to several times its original size). The C method is a loop over all pixels and all segments; it is slower, but requires relatively little memory, apart from the storage for the output raster data. If method="C" fails because of insufficient memory, this would again suggest that the raster images you're trying to create are too large to store in R memory.

The function as.owin is generic, with 28 methods. Which method is giving you trouble? What data are you converting to an owin?

spatstat is already written in a mix of R, C and C++. We are constantly looking for ways to accelerate the code and reduce the memory demand. If you have identified a particular case where the code is slow, we would like to know the details. If you do spot a way to fix or accelerate some of the code, please share it.

Adrian Baddeley
  • 2,534
  • 1
  • 5
  • 8
  • Thanks for pointing out the consequences of method choices `'c' or 'FFT'`, hadn't given sufficient thought to that. – Chris Mar 30 '22 at 05:27