Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
118
votes
4 answers

dplyr summarise: Equivalent of ".drop=FALSE" to keep groups with zero length in output

When using summarise with plyr's ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE. However, this doesn't work when using summarise with dplyr. Is there another way to keep empty categories…
eipi10
  • 91,525
  • 24
  • 209
  • 285
116
votes
6 answers

Getting the top values by group

Here's a sample data frame: d <- data.frame( x = runif(90), grp = gl(3, 30) ) I want the subset of d containing the rows with the top 5 values of x for each value of grp. Using base-R, my approach would be something like: ordered <-…
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
116
votes
5 answers

Gather multiple sets of columns

I have data from an online survey where respondents go through a loop of questions 1-3 times. The survey software (Qualtrics) records this data in multiple columns—that is, Q3.2 in the survey will have columns Q3.2.1., Q3.2.2., and Q3.2.3.: df <-…
Andrew
  • 36,541
  • 13
  • 67
  • 93
112
votes
5 answers

Select columns based on string match - dplyr::select

I have a data frame ("data") with lots and lots of columns. Some of the columns contain a certain string ("search_string"). How can I use dplyr::select() to give me a subset including only the columns that contain the string? I tried: # columns as…
Timm S.
  • 5,135
  • 6
  • 24
  • 38
108
votes
1 answer

R spreading multiple columns with tidyr

Take this sample variable df <- data.frame(month=rep(1:3,2), student=rep(c("Amy", "Bob"), each=3), A=c(9, 7, 6, 8, 6, 9), B=c(6, 7, 8, 5, 6, 7)) I can use spread from tidyr to change this to wide…
Ricky
  • 4,616
  • 6
  • 42
  • 72
107
votes
12 answers

dplyr mutate/replace several columns on a subset of rows

I'm in the process of trying out a dplyr-based workflow (rather than using mostly data.table, which I'm used to), and I've come across a problem that I can't find an equivalent dplyr solution to. I commonly run into the scenario where I need to…
Chris Newton
  • 1,350
  • 2
  • 13
  • 16
104
votes
15 answers

How to get summary statistics by group

I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like aggregate(). data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66,…
user1289220
  • 1,041
  • 2
  • 8
  • 3
102
votes
7 answers

Filter multiple values on a string column in dplyr

I have a data.frame with character data in one of the columns. I would like to filter multiple options in the data.frame from the same column. Is there an easy way to do this that I'm missing? Example: data.frame name = dat days name 88 …
Tom O
  • 1,497
  • 3
  • 13
  • 16
100
votes
6 answers

dplyr: "Error in n(): function should not be called directly"

I am attempting to reproduce one of the examples in the dplyr package but am getting this error message. I am expecting to see a new column n produced with the frequency of each combination. What am I missing? I triple checked that the package is…
Michael Bellhouse
  • 1,547
  • 3
  • 14
  • 26
96
votes
4 answers

Use pipe operator %>% with replacement functions like colnames()<-

How can I use the pipe operator to pipe into replacement function like colnames()<- ? Here's what I'm trying to do: library(dplyr) averages_df <- group_by(mtcars, cyl) %>% summarise(mean(disp), mean(hp)) colnames(averages_df) <- c("cyl",…
Alex Coppock
  • 2,122
  • 3
  • 15
  • 31
96
votes
4 answers

dplyr on data.table, am I really using data.table?

If I use dplyr syntax on top of a datatable, do I get all the speed benefits of datatable while still using the syntax of dplyr? In other words, do I mis-use the datatable if I query it with dplyr syntax? Or do I need to use pure datatable syntax to…
Polymerase
  • 6,311
  • 11
  • 47
  • 65
95
votes
5 answers

R move column to last using dplyr

For a data.frame with n columns, I would like to be able to move a column from any of 1-(n-1) positions, to be the nth column (i.e. a non-last column to be the last column). I would also like to do it using dplyr. I would like to do so without…
dule arnaux
  • 3,500
  • 2
  • 14
  • 21
95
votes
2 answers

Get dplyr count of distinct in a readable way

I'm new using dplyr, I need to calculate the distinct values in a group. Here's a table example: data <- data.frame(aa = c(1, 2, 3, 4, NA), bb = c('a', 'b', 'a', 'c', 'c')) I know I can do things like: library(dplyr) by_bb <-…
GabyLP
  • 3,649
  • 7
  • 45
  • 66
91
votes
9 answers

dplyr change many data types

I have a data.frame: dat <- data.frame(fac1 = c(1, 2), fac2 = c(4, 5), fac3 = c(7, 8), dbl1 = c('1', '2'), dbl2 = c('4', '5'), dbl3 = c('6', '7') …
ckluss
  • 1,477
  • 4
  • 21
  • 33
90
votes
1 answer

Removing NA in dplyr pipe

I tried to remove NA's from the subset using dplyr piping. Is my answer an indication of a missed step. I'm trying to learn how to write functions using dplyr: > outcome.df%>% + group_by(Hospital,State)%>% +…
ITCoderWhiz
  • 903
  • 1
  • 6
  • 5