Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
48
votes
6 answers

Pass a vector of variable names to arrange() in dplyr

I want to pass arrange() {dplyr} a vector of variable names to sort on. Usually I just type in the variables I want, but I'm trying to make a function where the sorting variables can be input as a function parameter. df <- structure(list(var1 =…
rsoren
  • 4,036
  • 3
  • 26
  • 37
48
votes
6 answers

Proper idiom for adding zero count rows in tidyr/dplyr

Suppose I have some count data that looks like this: library(tidyr) library(dplyr) X.raw <- data.frame( x = as.factor(c("A", "A", "A", "B", "B", "B")), y = as.factor(c("i", "ii", "ii", "i", "i", "i")), z = 1:6 ) X.raw # x y z # 1 A i 1 #…
pete
  • 2,327
  • 2
  • 15
  • 23
48
votes
3 answers

Is cut() style binning available in dplyr?

Is there a way to do something like a cut() function for binning numeric values in a dplyr table? I'm working on a large postgres table and can currently either write a case statement in the sql at the outset, or output unaggregated data and apply…
Michael Williams
  • 1,125
  • 2
  • 9
  • 13
47
votes
3 answers

Use filter in dplyr conditional on an if statement in R

Let me share an example of what I'm trying to do, since the title may not be as clear as I'd like it to be. This doesn't have reproducible code, but i can add a reproducible example if that will help: library(dplyr) if(this_team != "") { newdf <-…
Canovice
  • 9,012
  • 22
  • 93
  • 211
47
votes
2 answers

How to use dplyr::mutate_all for rounding selected columns

I'm using the following package version # devtools::install_github("hadley/dplyr") > packageVersion("dplyr") [1] ‘0.5.0.9001’ With the following tibble: library(dplyr) df <- structure(list(gene_symbol = structure(1:6, .Label = c("0610005C13Rik",…
neversaint
  • 60,904
  • 137
  • 310
  • 477
47
votes
7 answers

Using dplyr to conditionally replace values in a column

I have an example data set with a column that reads somewhat like this: Candy Sanitizer Candy Water Cake Candy Ice Cream Gum Candy Coffee What I'd like to do is replace it into just two factors - "Candy" and "Non-Candy". I can do this with…
user2762934
  • 2,332
  • 10
  • 33
  • 39
47
votes
8 answers

dplyr - using mutate() like rowmeans()

I can't find the answer anywhere. I would like to calculate new variable of data frame which is based on mean of rows. For example: data <- data.frame(id=c(101,102,103), a=c(1,2,3), b=c(2,2,2), c=c(3,3,3)) I want to use mutate to make variable d…
Tomasz Wojtas
  • 756
  • 2
  • 6
  • 12
47
votes
4 answers

dplyr: put count occurrences into new variable

Would like to get a hand on dplyr code, but cannot figure this out. Have seen a similar issue described here for many variables (summarizing counts of a factor with dplyr and Putting rowwise counts of value occurences into new variables, how to do…
user3375672
  • 3,728
  • 9
  • 41
  • 70
47
votes
4 answers

filtering data.frame based on row_number()

UPDATE: dplyr has been updated since this question was asked and now performs as the OP wanted I´m trying to get the second to the seventh line in a data.frame using dplyr. I´m doing this: require(dplyr) df <- data.frame(id = 1:10, var =…
Daniel Falbel
  • 1,721
  • 1
  • 21
  • 41
47
votes
9 answers

Use variable names in functions of dplyr

I want to use variable names as strings in functions of dplyr. See the example below: df <- data.frame( color = c("blue", "black", "blue", "blue", "black"), value = 1:5) filter(df, color == "blue") It works perfectly, but I would like…
kuba
  • 1,005
  • 2
  • 11
  • 16
46
votes
1 answer

How might I get detailed database error messages from dplyr::tbl?

I'm using R to plot some data I pull out of a database (the Stack Exchange data dump, to be specific): dplyr::tbl(serverfault, dbplyr::sql(" select year(p.CreationDate) year, avg(p.AnswerCount*1.0) answers_per_question, …
Jon 'links in bio' Ericson
  • 20,880
  • 12
  • 98
  • 148
46
votes
6 answers

dplyr filter with condition on multiple columns

I'd like to remove rows corresponding to a particular combination of variables from my data frame. Here's a dummy data : father<- c(1, 1, 1, 1, 1) mother<- c(1, 1, 1, NA, NA) children <- c(NA, NA, 2, 5, 2) cousins <- c(NA, 5, 1, 1, 4) dataset…
Wilcar
  • 2,349
  • 2
  • 21
  • 48
46
votes
4 answers

dplyr - groupby on multiple columns using variable names

I am working with R Shiny for some exploratory data analysis. I have two checkbox inputs that contain only the user-selected options. The first checkbox input contains only the categorical variables; the second checkbox contains only numeric…
Neil
  • 7,937
  • 22
  • 87
  • 145
46
votes
9 answers

Error message when running simple 'rename' function in R

Below a very simple data frame example I found in the internet. Running this in RStudio on my machine turns out an error message: Error: All arguments to rename must be named. The rename function seems to be straight forward but doesn't work for…
Mike
  • 477
  • 1
  • 4
  • 4
46
votes
3 answers

How to melt and cast dataframes using dplyr?

Recently I am doing all my data manipulations using dplyr and it is an excellent tool for that. However I am unable to melt or cast a data frame using dplyr. Is there any way to do that? Right now I am using reshape2 for this purpose. I want 'dplyr'…
Koundy
  • 5,265
  • 3
  • 24
  • 37