Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
162
votes
6 answers

How to select the rows with maximum values in each group with dplyr?

I would like to select a row with maximum value in each group with dplyr. Firstly I generate some random data to show my question set.seed(1) df <- expand.grid(list(A = 1:5, B = 1:5, C = 1:5)) df$value <- runif(nrow(df)) In plyr, I could use a…
Bangyou
  • 9,462
  • 16
  • 62
  • 94
154
votes
2 answers

Can dplyr join on multiple columns or composite key?

I realize that dplyr v3.0 allows you to join on different variables: left_join(x, y, by = c("a" = "b") will match x.a to y.b However, is it possible to join on a combination of variables or do I have to add a composite key beforehand? Something like…
JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116
147
votes
2 answers

Change value of variable with dplyr

I regularly need to change the values of a variable based on the values on a different variable, like this: mtcars$mpg[mtcars$cyl == 4] <- NA I tried doing this with dplyr but failed miserably: mtcars %>% mutate(mpg = mpg == NA[cyl == 4])…
luciano
  • 13,158
  • 36
  • 90
  • 130
145
votes
5 answers

Error: could not find function "%>%"

I'm running an example in R, going through the steps and everything is working so far except for this code produces an error: words <- dtm %>% as.matrix %>% colnames %>% (function(x) x[nchar(x) < 20]) Error: could not find function "%>%" I…
Haidar
  • 1,453
  • 2
  • 9
  • 5
144
votes
8 answers

Applying a function to every row of a table using dplyr?

When working with plyr I often found it useful to use adply for scalar functions that I have to apply to each and every row. e.g. data(iris) library(plyr) head( adply(iris, 1, transform , Max.Len= max(Sepal.Length,Petal.Length)) ) …
Stephen Henderson
  • 6,340
  • 3
  • 27
  • 33
136
votes
6 answers

R Conditional evaluation when using the pipe operator %>%

When using the pipe operator %>% with packages such as dplyr, ggvis, dycharts, etc, how do I do a step conditionally? For example; step_1 %>% step_2 %>% if(condition) step_3 These approaches don't seem to work: step_1 %>% step_2 if(condition) %>%…
mindlessgreen
  • 11,059
  • 16
  • 68
  • 113
134
votes
5 answers

Count number of rows by group using dplyr

I am using the mtcars dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*) group by clause in SQL. ddply() from plyr is working for me library(plyr) ddply(mtcars,…
charmee
  • 1,501
  • 2
  • 9
  • 9
127
votes
4 answers

Pass a string as variable name in dplyr::filter

I'm using mtcars dataset to illustrate my question. For example, I want to subset data to 4-cyl cars.I can do: mtcars %>% filter(cyl == 4) In my work, I need to pass a string variable as my column name. For example: var <- 'cyl' mtcars %>%…
zesla
  • 11,155
  • 16
  • 82
  • 147
127
votes
10 answers

R dplyr: Drop multiple columns

I have a dataframe and list of columns in that dataframe that I'd like to drop. Let's use the iris dataset as an example. I'd like to drop Sepal.Length and Sepal.Width and use only the remaining columns. How do I do this using select or select_ from…
Navaneethan Santhanam
  • 1,707
  • 2
  • 13
  • 17
125
votes
7 answers

Filter for complete cases in data.frame using dplyr (case-wise deletion)

Is it possible to filter a data.frame for complete cases using dplyr? complete.cases with a list of all variables works, of course. But that is a) verbose when there are a lot of variables and b) impossible when the variable names are not known…
user2503795
  • 4,035
  • 2
  • 34
  • 49
123
votes
7 answers

Replacement for "rename" in dplyr

I like plyr's renaming function rename. I have recently started using dplyr, and was wondering if there is an easy way to rename variables using a function from dplyr, that is as easy to use as to plyr's rename?
vergilcw
  • 2,093
  • 4
  • 16
  • 20
121
votes
2 answers

How to specify names of columns for x and y when joining in dplyr?

I have two data frames that I want to join using dplyr. One is a data frame containing first names. test_data <- data.frame(first_name = c("john", "bill", "madison", "abby", "zzz"), stringsAsFactors = FALSE) The other data…
Lincoln Mullen
  • 6,257
  • 4
  • 27
  • 30
119
votes
3 answers

dplyr mutate with conditional values

In a large dataframe ("myfile") with four columns I have to add a fifth column with values conditionally based on the first four columns. Prefer answers with dplyr and mutate, mainly because of its speed in large datasets. My dataframe looks like…
rdatasculptor
  • 8,112
  • 14
  • 56
  • 81
118
votes
8 answers

Find duplicated elements with dplyr

I tried using the code presented here to find ALL duplicated elements with dplyr like this: library(dplyr) mtcars %>% mutate(cyl.dup = cyl[duplicated(cyl) | duplicated(cyl, from.last = TRUE)]) How can I convert code presented here to find ALL…
luciano
  • 13,158
  • 36
  • 90
  • 130
118
votes
8 answers

Extract row corresponding to minimum value of a variable by group

I wish to (1) group data by one variable (State), (2) within each group find the row of minimum value of another variable (Employees), and (3) extract the entire row. (1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't…
Ed S
  • 1,293
  • 2
  • 9
  • 6