Questions tagged [r]

R is a free, open-source programming language & software environment for statistical computing, bioinformatics, visualization & general computing. Please use minimal reproducible examples others can run using copy & paste. Show desired output entirely. Use dput() for data & specify all non-base packages with library(). Don't embed pictures for data or code, use indented code blocks instead. For statistics questions, use https://stats.stackexchange.com.

R Programming Language

R is a free, open-source programming language and software environment for statistical computing, bioinformatics, information graphics, and general computing. It is a multi-paradigm language and dynamically typed. R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. R was created by Ross Ihaka and Robert Gentleman and is now developed by the R Development Core Team. The R environment is easily extended through a packaging system on CRAN, the Comprehensive R Archive Network.

Scope of questions

This tag should be used for programming-related questions about R. Including a minimal reproducible example in your question will increase your chances of getting a timely, useful answer. Questions should not use the tag unless they relate specifically to the RStudio interface and not just the R language.

If your question is more focused on statistics or data science, use Cross Validated or Data Science, respectively. Bioinformatics-specific questions may be better received on Bioconductor Support or Biostars. General questions about R (such as requests for off-site resources or discussion questions) are unsuitable for Stack Overflow and may be appropriate for one of the general, or special-interest, R mailing lists.

Please do not cross-post across multiple venues. Do research (read tag wikis, look at existing questions, or search online) to determine the most appropriate venue so that you have a better chance of receiving solutions to your question. Your question may be automatically migrated to a more appropriate Stack Exchange site. If you receive no response to your questions after a few days, or if your question is put on hold for being off-topic, it is then OK to post to another venue, giving a link to your Stack Overflow question - but don't cross-post just because your question is down-voted or put on hold for being unclear. Instead, work on improving your question.

Stack Overflow resources

Official CRAN Documentation

Other CRAN resources

Free Resources

Interactive R learning

  • Coursera - Learn how to use R for effective data analysis
  • DataCamp - Many interactive R and data science courses
  • Dataquest - Interactive R courses for data science
  • edX - Basic Statistics and R (basic course, not just for life sciences)
  • edX - Introduction to R Programming
  • R-exercises - 1000+ R exercises and solutions
  • RPubs - Easy web publishing from R
  • Swirl - R-package to learn R interactively

Free books on R:

Programming Chrestomathy (problems written in many languages)

Other free resource materials

IDEs and editors for R

Web application framework for R

  • Shiny - Turn your analyses into interactive web applications. No HTML, CSS, or JavaScript knowledge required.
  • FastRWeb - Fast Interactive Web Framework for Data Mining Using R

Graphical User Interfaces (GUI) in R

Code style guides

Other Resources

Recommended additional R resources include:

Alternative R engines

All alternative R engines have the goal of increasing R's performance and memory management.

Downstream distributions with complete compatibility

Forks of R with near 100% code compatibility

  • pqR by Radford Neal (C-based).
  • Rho by Karl Millar, based upon CXXR by Andrew Runnalls (C++-based). The development on Rho has been suspended indefinitely.

Rewrites with high code compatibility

  • Renjin by BeDataDriven (Java-based).
  • TERR by Tibco (C++-based).

Experimental and early-stage rewrites

  • Riposte by Justin Talbot (C++-based).
  • FastR by Jan Vitek and Tomas Kalibera (Java-based).

Unrelated tags

Due to R's simple name, questions sometimes get tagged with the tag when a different topic is meant. Here is a list of tags that mistagged R questions might be re-tagged to

  • for questions related to the file R.java on
  • "A command line tool for running JavaScript scripts that use the Asynchronous Module Definition API (AMD) for declaring and using JavaScript modules and regular JavaScript script files. It is part of the RequireJS project, and works with the RequireJS implementation of AMD." (from the wiki summary)
  • for questions related to RStudio use the rstudio tag. Don't use this tag just because you are working with RStudio.
496613 questions
57
votes
4 answers

preallocate list in R

It is inefficient in R to expand a data structure in a loop. How do I preallocate a list of a certain size? matrix makes this easy via the ncol and nrow arguments. How does one do this in lists? For example: x <- list() for (i in 1:10) { …
Alex
  • 19,533
  • 37
  • 126
  • 195
57
votes
7 answers

rank and order in R

i am having trouble understanding the difference between the R function rank and the R function order. they seem to produce the same output: > rank(c(10,30,20,50,40)) [1] 1 3 2 5 4 > order(c(10,30,20,50,40)) [1] 1 3 2 5 4 Could somebody shed some…
Alex
  • 19,533
  • 37
  • 126
  • 195
57
votes
4 answers

Why do I get "warning longer object length is not a multiple of shorter object length"?

I have dataframe dih_y2. These two lines give me a warning: > memb = dih_y2$MemberID[1:10] > dih_col = which(dih_y2$MemberID == memb) Warning message: In dih_y2$MemberID == memb : longer object length is not a multiple of shorter object…
ashim
  • 24,380
  • 29
  • 72
  • 96
56
votes
1 answer

Implementing standard software design patterns (focus on MVC) in R

Currently, I'm reading a lot about Software Engineering, Software Design, Design Patterns etc. Coming from a totally different background, that's all new fascinating stuff to me, so please bear with me in case I'm not using the correct technical…
Rappster
  • 12,762
  • 7
  • 71
  • 120
56
votes
6 answers

Is there a dictionary functionality in R

Is there a way to create a "dictionary" in R, such that it has pairs? Something to the effect of: x=dictionary(c("Hi","Why","water") , c(1,5,4)) x["Why"]=5 I'm asking this because I am actually looking for a two categorial variables function. So…
eran
  • 14,496
  • 34
  • 98
  • 144
56
votes
8 answers

Java-R integration?

I have a Java app which needs to perform partial least squares regression. It would appear there are no Java implementations of PLSR out there. Weka might have had something like it at some point, but it is no longer in the API. On the other hand, I…
mbatchkarov
  • 15,487
  • 9
  • 60
  • 79
56
votes
2 answers

Roxygen2 - how to properly document S3 methods

I've read the Roxygen2 PDF and this site, and I'm lost on the difference between @method @S3method @export and how to use these to properly document S3 methods. I worked up the follow example for discussion: How would I properly document these? How…
Suraj
  • 35,905
  • 47
  • 139
  • 250
56
votes
8 answers

Aggregate a dataframe on a given column and display another column

I have a dataframe in R of the following form: > head(data) Group Score Info 1 1 1 a 2 1 2 b 3 1 3 c 4 2 4 d 5 2 3 e 6 2 1 f I would like to aggregate it following the Score column…
jul635
  • 794
  • 1
  • 7
  • 13
56
votes
3 answers

How I can select rows from a dataframe that do not match?

I'm trying to identify the values in a data frame that do not match, but can't figure out how to do this. # make data frame a <- data.frame( x = c(1,2,3,4)) b <- data.frame( y = c(1,2,3,4,5,6)) # select only values from b that are not in 'a' #…
djq
  • 14,810
  • 45
  • 122
  • 157
56
votes
2 answers

Reverse stacked bar order

I'm creating a stacked bar chart using ggplot like this: plot_df <- df[!is.na(df$levels), ] ggplot(plot_df, aes(group)) + geom_bar(aes(fill = levels), position = "fill") Which gives me something like this: How do I reverse the order the stacked…
Simon
  • 9,762
  • 15
  • 62
  • 119
56
votes
4 answers

How to round a number and make it show zeros?

The common code in R for rounding a number to say 2 decimal points is: > a = 14.1234 > round(a, digits=2) > a > 14.12 However if the number has zeros as the first two decimal digits, R suppresses zeros in display: > a = 14.0034 > round(a,…
M. Er
  • 881
  • 1
  • 7
  • 13
56
votes
7 answers

SparkR vs sparklyr

Does someone have an overview with respect to advantages/disadvantages of SparkR vs sparklyr? Google does not yield any satisfactory results and both seem fairly similar. Trying both out, SparkR appears a lot more cumbersome, whereas sparklyr is…
koVex
  • 641
  • 1
  • 6
  • 10
56
votes
3 answers

R summary() equivalent in numpy

Is there an equivalent of R's summary() function in numpy? numpy has std, mean, average functions separately, but does it have a function that sums up everything, like summary does in R? If found this question which relates to pandas and this…
iulian
  • 5,494
  • 3
  • 29
  • 39
56
votes
2 answers

Override column types when importing data using readr::read_csv() when there are many columns

I am trying to read a csv file using readr::read_csv in R. The csv file that I am importing has about 150 columns, I am just including the first few columns for the example. I am looking to override the second column from the default type (which is…
rajvijay
  • 1,641
  • 4
  • 23
  • 28
56
votes
3 answers

list output truncated - How to expand listed variables with str() in R

I have a data.frame df with 600+ variables. I'm writing a function that automates the creation of columns and need to visually check them once. The str function provides a good summary: str(df) 'data.frame': 29 obs. of 602 variables: $…
jpinelo
  • 1,414
  • 5
  • 16
  • 28