21

[ Edit: June 2013 ] A paper has appeared on ArXiv describing this issue in greater detail, and suggesting some solutions: http://arxiv.org/abs/1303.4808. It will appear in the Journal of Statistical Software later in 2013.

I have a cronjob on my Ubuntu servers that downloads and installs every source package from CRAN. However on the same server I started to notice some irregular activity. It might be totally unrelated, but it got me thinking about if there could be a possibility that some CRAN packages contain malicious code.

The process of creating and publishing a cran package is extremely easy. Maybe a little too easy. You upload your package to the FTP, Kurt will do a check, and publish it. With the volume of R packages that is being uploaded every day, it is reasonable to assume that there is no extensive auditing of the package going on. Also there is no signing of a package using a private key, like most distro packages. Even the email address in the DESCRIPTION is rarely verified.

Now it would not be very hard to include some code that installs a rootkit, either at compile time or at run time. Compile time is probably more vulnerable, because I install my packages using sudo, which I probably should stop doing. But also at runtime a lot can be done. The linux kernel has had several security vulnerabilities lately, and I have confirmed myself that it can be extremely easy to obtain root via a privilege escalation exploit, on a completely up to date system. As R usually has internet access, the malicious code does not even have to be included in the package, it can simply be downloaded from somewhere using wget or download.file().

That said, are R users considering this at all? Or is the philosophy mostly that you should only download packages from people you trust? Still without signing the packages that is not very reliable. What could be a safer approach to installing cran packages? I have considered something like a separate machine for building packages and then copying the binaries, and always running R in a sandbox. That is a little cumbersome though.

Jeroen Ooms
  • 31,998
  • 35
  • 134
  • 207
  • 3
    A first step would be to create a specific user, say "R", to build and install R and R packages: malicious code would not have direct root access (the escalation problems remain, of course). – Vincent Zoonekynd Jan 14 '12 at 22:16
  • 2
    Vote to close: Not an R programming question, more an R policy question, therefore doesn't belong on stackoverflow, also discussive. – Spacedman Jan 14 '12 at 22:41
  • There is the question of how many R users would install the malicious package. I certainly wouldn't replace a long-standing and useful package like ggplot2 or lme4 with a package of new and unknown usefulness. The biggest risk I can see would be in specific areas, where packages don't already exist, and by definition there are only a few people working in that area. Wouldn't they all know each other, so a package created by someone they have never heard of would raise eyebrows? – Michelle Jan 14 '12 at 22:46
  • It would be cute to create a malicious package that was spelled nearly the same as a popular package -- for example there is no longer a `ggplot` on CRAN, just `ggplot2` ... although I guess you'd have to get Kurt Hornik to accept it ... – Ben Bolker Jan 14 '12 at 22:52
  • 1
    This is pretty nonsensical, Jeroen. Vincent is correct in pointing out that you can simply run this as a user rather than root (and should, if you insist on "all" CRAN packages probably use a chroot or virtual machine anyway; see my answer to your other question). And Spacedman also raises a valid point as this has nuttin' to do with R per se. You'd have the issue with CPAN, PyPy, and whatever other public repo you look at. So I vote to close too. – Dirk Eddelbuettel Jan 14 '12 at 23:39
  • 2
    Hmz. I could move it to serverfault but there are not a lot of R users over there. – Jeroen Ooms Jan 14 '12 at 23:52
  • @Dirk I do think the question is somewhat specific to R as CRAN is tightly integrated with R. It is hard to use R without touching CRAN. Also I am pretty sure that creating an R user is not going to solve the problem. MySQL runs under user 'mysql' but if you run a buggy cms with sql injection vulnerabilities, you'll be hacked within an hour. I am going to look into chroot. Would this work at runtime or only during installation of packages? – Jeroen Ooms Jan 15 '12 at 00:06
  • 1
    Runtime. It's like a virtual machine, only simpler. The BSD variants have something called jail. This is really just *up to you as your own sysadmin* to set your machine up in a way that cannot harm you, and has nothing to do with R packages per se or how they are distributed. Remember, you decided to set up a cron job to install (semi-)random code. – Dirk Eddelbuettel Jan 15 '12 at 00:22
  • Isn't there also a problem in installing *every* R package? Seems like it would be just as bad to install every package for Debian or Red Hat. Why install that which you don't need, don't use, and don't monitor? – Iterator Jan 15 '12 at 08:28
  • I do need everything :-) It's for my opencpu framework. – Jeroen Ooms Jan 16 '12 at 07:01
  • 1
    @Jeroen I would recommend moving to Serverfault. There are reasonable R administration topics to be discussed. It doesn't hurt to pop into the SO R chatroom when you've posted an R question outside of SO & CV, to let folks know to look over at SF or another site. – Iterator Feb 08 '12 at 13:50
  • As for this question, I wouldn't say that this is the biggest risk, not least because code served on CRAN is associated with a maintainer/author. Other risks in production environments include locking down shell access, locking down ports/servers, and a number of other things. – Iterator Feb 08 '12 at 13:53
  • @Jeroen, are you asking about Linux only, (missing tags and title), or all OSes? R installs on Windows are often painful and sometimes it's easier to install as administrator; that's in theory a security hole. – smci Nov 23 '16 at 15:23

0 Answers0