Run R script and hide the actual code from user

Question

I have created an R code script that:

Reads some data from a database
Makes some transformations and..
exports into a csv the modified table.

This code needs to run in a client's machine, but we need to "hide" the actual code from the user.

Is there any useful suggestions on how we can achieve that?

This is going to be nearly impossible. If you have very unsophisticated users you might be able to obfuscate (e.g. by having them run a Shiny app locally). https://www.researchgate.net/post/How_to_make_invisible_the_R_code — Ben Bolker, Jan 26 '21 at 17:20
@BenBolker why is this impossible? Can't this R script be "exported" as an .exe (or some other alternative) and run from the client instead? — Vamkos, Jan 26 '21 at 17:24
Vamkos, unfortunately R doesn't provide the ability to compile/export a `.exe` from a script, perhaps unlike some other languages. In general, your only reasonable way to obscure code from the end-users is to not run the R stuff on their computers. Options for this: run it as a plumber API end-point (you have to host it somewhere), a shiny app, or perhaps Rserve (though that's a bit more difficult to secure well in a cross-network topology). — r2evans, Jan 26 '21 at 17:48
Some methods of deploying R-code to end users without requiring them to know R (or even manage an R instance) are focused on handling the installation and such, not at all towards securing the running code; a determined user will most likely be able to get the full source without too much effort. Such efforts include [RInno](https://github.com/ficonsulting/RInno) and [DesktopDeployR](https://github.com/wleepang/DesktopDeployR) among others. I am not recommending for or against either, just to demonstrate that most effort is in dealing with IT-challenged users, not the curious ones. — r2evans, Jan 26 '21 at 17:51
Bottom line, the only reasonable (and certainly the best) way to secure R code from curious users is to run all of the R-based processing on computers you control. — r2evans, Jan 26 '21 at 17:52
This has been discussed various times in various venues over the years (probably mostly in R mailing lists). @r2evans, you could post your comments as a canonical SO answer (assuming it's not somewhere on the site already ...) — Ben Bolker, Jan 26 '21 at 21:00
@Vamkos, did you ever come to some form of resolution? I know my answer did not give you steps on "how to do this". Your question is not unique (I myself was asking similar questions years before). — r2evans, Jun 02 '22 at 12:56

score 7 · Answer 1 · answered Jan 26 '21 at 22:10

Up front

... it will be nearly impossible to deploy an R <something> to another computer in a way that prevents curious users from accessing the source code.

From a mailing list conversation in 2011, in response to "I would not like anyone to be able to read the code.",

R is an open source project, so providing ways for you to do this is not one of our goals.

Duncan Murdoch https://stat.ethz.ch/pipermail/r-help/2011-July/282755.html

(Prof Murdoch was on the R Core Team and R Foundation for many years.)

Background

Several (many?) programming languages provide the ability to compile a script or program into an executable, the .exe you reference. For example, python has tools like py2exe and PyInstaller. The tools range from merely compactifying the script into a zip-ball, perhaps obfuscating the script; ... to actually creating a exe with the script either tightly embedded or such. (This part could use some more citations/research.)

This is usually good enough for many people, by keeping the honest out. I say it that way because all you need to do is google phrases like decompile py2exe and you'll find tools, howtos, tutorials, etc, whose intent might be honestly trying to help somebody recover lost code. Regardless of the intentions, they will only slow curious users.

Unfortunately, there are no tools that do this easily for R.

There are tools with the intent of making it easy for non-R-users to use R-based tools. For instance, RInno and DesktopDeployR are two tools with the intent of creating Windows (no mac/linux) installers that support R or R/shiny tools. But the intent of tools like this is to facilitate the IT tasks involved with getting a user/client to install and maintain R on their computer, not with protecting the code that it runs.

Constrain `R.exe`?

There have been questions (elsewhere?) that ask if they can modify the R interpreter itself so that it does not do everything it is intended to do. For instance, one could redefine base::print in such a way that functions' contents cannot be dumped, and debug doesn't show the code it's about to execute, and perhaps several other protective steps.

There are a few problems with this approach:

There is always another way to get at a function's contents. Even if you stop print.default and the debugger from doing this, there are others ways to get to the functions (body(.), for one). How many of these rabbit holes do you feel you will accurately traverse, get them all ... with no adverse effect on normal R code?
Even if you feel you can get to them all, are you encrypting the source .R files that contain your proprietary content? Okay, encrypting is good, except you need to decrypt the contents somehow. Many tools that have encrypted contents do so to thwart reverse-engineering, so they also embed (obfuscatedly, of course) the decryption key in the application itself. Just give it time, somebody will find and extract it.

You might think that you can download the key on start-up (not stored within the app), so that the code is decrypted in real-time. Sorry, network sniffers will get the key. Even if you retrieve it over https://, tools such as https://mitmproxy.org/ will render this step much less effective.
Let's say you have recompiled R to mask print and such, have a way to distribute source code encrypted, and are able to decrypt it in a way that does not easily reveal the key (for full decryption of the source code files). While it takes a dedicated user to wade through everything above to get to the source code, none of the above steps are required: they may legally compel you to release your changes to the R interpreter itself (that you put in place to prevent printing function contents). This doesn't reveal your source code, but it will reveal many of your methods, which might be sufficient. (Or just the risk of legal costs.)

R is GPL, and that means that anything that links to it is also "tainted" with the GPL. This means that anything compiled with Rcpp, for instance, will also be constrained/liberated (your choice) by the GPL. This includes thoughts of using RInside: it is also GPL (>= 2).

To do it without touching the GPL, you'd need to write your interpreter (relatively from scratch, likely) without code from the R project.

Alternatives

Ultimately, if you want to release R-based utilities/apps/functionality to clients, the only sure-fire way to allow them to use your code without seeing it is to ... control the computers on which R will run (and source code will reside). I'll add more links supporting this claim as I find them, but a small start:

Options include anything that keeps the R code and R interpreter completely under your control. Simple examples:

Shiny apps, self-hosted (or on shinyapps.io if you trust their security); servers include Shiny Server (both free and commercial versions), RStudio Connect (commercial only), and ShinyProxy. (The list is not known to be exclusive.)
Rplumber is an API server, not a shiny server. The intent is for single HTTP(s) endpoint calls, possibly authenticated, supporting whatever HTTP supports (post, get, etc). This can be served in various ways, see its hosting page for options.
Rserve. I know less about this, but from what I've experienced with it, I've not had as much luck integrating with enterprise systems (where, e.g., authentication and fine-control over authorization is important). This does allow near-raw access to R, so it might not be what you want (especially when the intent is to give to clients who may not be strong R users themselves).
OpenCPU should be discussed, but not as a viable candidate for "protect your code". It is very similar to rplumber in that it provides HTTP endpoints, but it supports endpoints for every exported function in every package installed in its R library. This includes the base package, so it is not at all difficult to get the source code of any function that you could get on the R console. I believe this is a design feature, even if it is perfectly at odds with your intent to protect your code.
Anything that can call R or Rscript. This might be PHP or mod_python or similar. Any web-page serving language that can exec("/usr/bin/Rscript",...) can take its output and turn it around to the calling agent. (It might also be possible, for example, for a PHP front-end to call an opencpu endpoint that only permits connections from the PHP-serving host.)

wow, thinking about offering a bounty to give this question more points. — Ben Bolker, Jan 26 '21 at 22:14

Run R script and hide the actual code from user

1 Answers1

Up front

Background

Constrain `R.exe`?

Alternatives

Linked

Related

Run R script and hide the actual code from user

1 Answers1

Up front

Background

Constrain R.exe?

Alternatives

Linked

Related

Constrain `R.exe`?