R requires CPU more than anything else so it is recommended to pick one of the newer generation compute optimized instance types, preferably with a SSD disk.
I've recently run into a problem with high memory usage (quickly raising to 100%) during load testing. To reproduce: there is an R package for which processing time is UP TO 0.2 in no-stress conditions. If I'm trying to query one of the endpoints using curl
for 1000 jsons on 3 machines in parallel all of the memory is suddenly used which results in 'cannot fork' or:
cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory' In call: system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE)
The setup is 2x AWS 8GB CPU-optimized servers + load balancer all in private network. HTTPS is enabled and my main usage is online processing of requests so I'm mostly querying /json
endpoints.
Do you happen to have any suggestions on how to approach this issue? The plan is to have more packages installed (more online processes requesting result from various functions) and don't want to end up having 32GB RAM per box.
All of the packages are deployed with such options:
LazyData: false
LazyLoad: false
They are also added into serverconf.yml.j2
- preload section.
RData files are loaded within an onLoad
function by calling utils::data
.
Also, keeping in mind that I'm using OpenCPU without github and only one-way communication (from backend to ocpu box) which options do you suggest to turn on/optimize? It's not clearly stated in the docs yet.