Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The r dplyr package is the next iteration of the plyr package. It has three main goals:

Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.

Provide fast performance for in-memory data by writing key pieces in C++.

Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Tibbles (from tibble package)
Databases (from dbplyr package)
Introduction to dplyr
Adding a new SQL backend (from dbplyr package)
Programming with dplyr
Two-table verbs
Window functions and grouped mutate/filter

Other resources

Related tags

R's plyr, magrittr, tidyr, tidyverse and data.table packages
Python's pandas library

36044 questions

118

votes

4 answers

dplyr summarise: Equivalent of ".drop=FALSE" to keep groups with zero length in output

When using summarise with plyr's ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE. However, this doesn't work when using summarise with dplyr. Is there another way to keep empty categories…

r dplyr plyr tidyr

asked Mar 20 '14 at 03:52

eipi10

91,525
24
209
285

116

votes

6 answers

Getting the top values by group

Here's a sample data frame: d <- data.frame( x = runif(90), grp = gl(3, 30) ) I want the subset of d containing the rows with the top 5 values of x for each value of grp. Using base-R, my approach would be something like: ordered <-…

r data.table dplyr

asked Jan 04 '15 at 13:36

Richie Cotton

118,240
47
247
360

116

votes

5 answers

Gather multiple sets of columns

I have data from an online survey where respondents go through a loop of questions 1-3 times. The survey software (Qualtrics) records this data in multiple columns—that is, Q3.2 in the survey will have columns Q3.2.1., Q3.2.2., and Q3.2.3.: df <-…

r reshape dplyr qualtrics tidyr

asked Sep 19 '14 at 02:41

Andrew

36,541
13
67
93

112

votes

5 answers

Select columns based on string match - dplyr::select

I have a data frame ("data") with lots and lots of columns. Some of the columns contain a certain string ("search_string"). How can I use dplyr::select() to give me a subset including only the columns that contain the string? I tried: # columns as…

r regex dplyr

asked Sep 18 '14 at 22:24

Timm S.

5,135
6
24
38

108

votes

1 answer

R spreading multiple columns with tidyr

Take this sample variable df <- data.frame(month=rep(1:3,2), student=rep(c("Amy", "Bob"), each=3), A=c(9, 7, 6, 8, 6, 9), B=c(6, 7, 8, 5, 6, 7)) I can use spread from tidyr to change this to wide…

r dataframe dplyr tidyr

asked Jun 02 '15 at 09:22

Ricky

4,616
6
42
72

107

votes

12 answers

dplyr mutate/replace several columns on a subset of rows

I'm in the process of trying out a dplyr-based workflow (rather than using mostly data.table, which I'm used to), and I've come across a problem that I can't find an equivalent dplyr solution to. I commonly run into the scenario where I need to…

r data.table dplyr

asked Dec 04 '15 at 19:39

Chris Newton

1,350
2
13
16

104

votes

15 answers

How to get summary statistics by group

I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like aggregate(). data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66,…

r dplyr stat summarize r-faq

asked Mar 23 '12 at 22:04

user1289220

1,041
2
8
3

102

votes

7 answers

Filter multiple values on a string column in dplyr

I have a data.frame with character data in one of the columns. I would like to filter multiple options in the data.frame from the same column. Is there an easy way to do this that I'm missing? Example: data.frame name = dat days name 88 …

r dplyr string-matching multiple-conditions

asked Sep 03 '14 at 14:51

Tom O

1,497
3
13
16

100

votes

6 answers

dplyr: "Error in n(): function should not be called directly"

I am attempting to reproduce one of the examples in the dplyr package but am getting this error message. I am expecting to see a new column n produced with the frequency of each combination. What am I missing? I triple checked that the package is…

r function plyr dplyr conflicting-libraries

asked Apr 02 '14 at 03:44

Michael Bellhouse

1,547
3
14
26

votes

4 answers

Use pipe operator %>% with replacement functions like colnames()<-

How can I use the pipe operator to pipe into replacement function like colnames()<- ? Here's what I'm trying to do: library(dplyr) averages_df <- group_by(mtcars, cyl) %>% summarise(mean(disp), mean(hp)) colnames(averages_df) <- c("cyl",…

r dplyr pipe magrittr

asked Jan 22 '15 at 23:51

Alex Coppock

2,122
3
15
31

votes

4 answers

dplyr on data.table, am I really using data.table?

If I use dplyr syntax on top of a datatable, do I get all the speed benefits of datatable while still using the syntax of dplyr? In other words, do I mis-use the datatable if I query it with dplyr syntax? Or do I need to use pure datatable syntax to…

r data.table dplyr

asked Dec 16 '14 at 18:35

Polymerase

6,311
11
47
65

votes

5 answers

R move column to last using dplyr

For a data.frame with n columns, I would like to be able to move a column from any of 1-(n-1) positions, to be the nth column (i.e. a non-last column to be the last column). I would also like to do it using dplyr. I would like to do so without…

r dplyr

asked May 10 '17 at 16:14

dule arnaux

3,500
2
14
21

votes

2 answers

Get dplyr count of distinct in a readable way

I'm new using dplyr, I need to calculate the distinct values in a group. Here's a table example: data <- data.frame(aa = c(1, 2, 3, 4, NA), bb = c('a', 'b', 'a', 'c', 'c')) I know I can do things like: library(dplyr) by_bb <-…

r dataframe dplyr

asked Nov 03 '14 at 18:12

GabyLP

3,649
7
45
66

votes

9 answers

dplyr change many data types

I have a data.frame: dat <- data.frame(fac1 = c(1, 2), fac2 = c(4, 5), fac3 = c(7, 8), dbl1 = c('1', '2'), dbl2 = c('4', '5'), dbl3 = c('6', '7') …

r dataframe dplyr

asked Dec 27 '14 at 14:38

ckluss

1,477
4
21
33

votes

1 answer

Removing NA in dplyr pipe

I tried to remove NA's from the subset using dplyr piping. Is my answer an indication of a missed step. I'm trying to learn how to write functions using dplyr: > outcome.df%>% + group_by(Hospital,State)%>% +…

r dplyr na

asked Oct 30 '14 at 23:43

ITCoderWhiz

Prev 1 2

…

99 100 Next