Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The r dplyr package is the next iteration of the plyr package. It has three main goals:

Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.

Provide fast performance for in-memory data by writing key pieces in C++.

Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Tibbles (from tibble package)
Databases (from dbplyr package)
Introduction to dplyr
Adding a new SQL backend (from dbplyr package)
Programming with dplyr
Two-table verbs
Window functions and grouped mutate/filter

Other resources

Related tags

R's plyr, magrittr, tidyr, tidyverse and data.table packages
Python's pandas library

36044 questions

votes

3 answers

How to replace all NA in a dataframe using tidyr::replace_na?

I'm trying to fill all NAs in my data with 0's. Does anyone know how to do that using replace_na from tidyr? From documentation, we can easily replace NA's in different columns with different values. But how to replace all of them with some value? I…

r dplyr tidyr

asked Aug 08 '17 at 19:40

zesla

11,155
16
82
147

votes

5 answers

Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

I have a data frame that looks like this: #df ID DRUG FED AUC0t Tmax Cmax 1 1 0 100 5 20 2 1 1 200 6 25 3 0 1 NA 2 30 4 0 0 150 6 65 Ans so on. I want to summarize some…

r plyr dplyr shadowing name-collision

asked Nov 14 '14 at 06:00

Amer

2,131
3
23
38

votes

8 answers

Fitting several regression models with dplyr

I would like to fit a model for each hour(the factor variable) using dplyr, I'm getting an error, and i'm not quite sure what's wrong. df.h <- data.frame( hour = factor(rep(1:24, each = 21)), price = runif(504, min = -10, max = 125), …

r dplyr

asked Mar 28 '14 at 12:47

Thorst

1,590
1
21
35

votes

4 answers

Summarize all group values and a conditional subset in the same call

I'll illustrate my question with an example. Sample data: df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"), B = c(1, 5, 7, 23, 54, 202)) df ID A B 1 1 foo 1 2 1 bar 5 3 2 foo 7 4 2 foo …

r dplyr sqldf

asked May 07 '14 at 21:33

kevinykuo

4,600
5
23
31

votes

6 answers

dplyr::select function clashes with MASS::select

If I load the MASS package: library(MASS) then load try to run dplyr::select, I get a error: library(dplyr) mtcars %.% select(mpg) # Error in select(`__prev`, mpg) : unused argument (mpg) How can I use dplyr::select with the MASS package loaded?

r dplyr

asked Jun 13 '14 at 09:32

luciano

13,158
36
90
130

votes

6 answers

dplyr: how to reference columns by column index rather than column name using mutate?

Using dplyr, you can do something like this: iris %>% head %>% mutate(sum=Sepal.Length + Sepal.Width) Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum 1 5.1 3.5 1.4 0.2 setosa 8.6 2 4.9 …

r dplyr

asked Sep 16 '15 at 21:08

Alby

5,522
7
41
51

votes

4 answers

Concatenate strings by group with dplyr

i have a dataframe that looks like this > data <- data.frame(foo=c(1, 1, 2, 3, 3, 3), bar=c('a', 'b', 'a', 'b', 'c', 'd')) > data foo bar 1 1 a 2 1 b 3 2 a 4 3 b 5 3 c 6 3 d I would like to create a new column bars_by_foo…

r dplyr

asked Jul 21 '16 at 21:54

crf

1,810
3
15
23

votes

3 answers

How to deal with nonstandard column names (white space, punctuation, starts with numbers)

df <- structure(list(`a a` = 1:3, `a b` = 2:4), .Names = c("a a", "a b" ), row.names = c(NA, -3L), class = "data.frame") and the data looks like a a a b 1 1 2 2 2 3 3 3 4 Following call to select select(df, 'a a') gives Error in…

r dplyr r-faq

asked Apr 03 '14 at 15:27

Flux

votes

3 answers

How to group by all but one columns?

How do I tell group_by to group the data by all columns except a given one? With aggregate, it would be aggregate(x ~ ., ...). I tried group_by(data, -x), but that groups by the negative-of-x (i.e. the same as grouping by x).

r dplyr

asked Aug 27 '16 at 12:35

Roman Cheplyaka

37,738
7
72
121

votes

4 answers

Conditionally Count in dplyr

I have some member order data that I would like to aggregate by week of order. This is what the data looks like: memberorders=data.frame(MemID=c('A','A','B','B','B','C','C','D'), week = c(1,2,1,4,5,1,4,1), value =…

r dplyr

asked Apr 27 '15 at 20:06

SFuj

votes

5 answers

standard evaluation in dplyr: summarise a variable given as a character string

UPDATE July 2020: dplyr 1.0 has changed pretty much everything about this question as well as all of the answers. See the dplyr programming vignette here: https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html The new way to refer…

r dplyr

asked Nov 03 '14 at 22:06

Ajar

1,786
2
15
23

votes

2 answers

Using functions of multiple columns in a dplyr mutate_at call

I'd like to use dplyr's mutate_at function to apply a function to several columns in a dataframe, where the function inputs the column to which it is directly applied as well as another column in the dataframe. As a concrete example, I'd look to…

r dplyr

asked Aug 29 '16 at 15:32

bschneidr

6,014
1
37
52

votes

11 answers

Using dplyr window functions to calculate percentiles

I have a working solution but am looking for a cleaner, more readable solution that perhaps takes advantage of some of the newer dplyr window functions. Using the mtcars dataset, if I want to look at the 25th, 50th, 75th percentiles and the mean and…

r dplyr tidyr

asked May 27 '15 at 16:38

dreww2

1,551
3
16
18

votes

5 answers

Removing NA observations with dplyr::filter()

My data looks like this: library(tidyverse) df <- tribble( ~a, ~b, ~c, 1, 2, 3, 1, NA, 3, NA, 2, 3 ) I can remove all NA observations with drop_na(): df %>% drop_na() Or remove all NA observations in a single column (a for…

r dplyr

asked Mar 04 '15 at 14:59

emehex

9,874
10
54
100

votes

7 answers

Pass arguments to dplyr functions

I want to parameterise the following computation using dplyr that finds which values of Sepal.Length are associated with more than one value of Sepal.Width: library(dplyr) iris %>% group_by(Sepal.Length) %>% …

r dplyr lazy-evaluation

asked Jan 15 '15 at 23:50

asnr

1,692
1
14
17

Prev 1 2 3

…

99 100 Next