Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
75
votes
3 answers

How to replace all NA in a dataframe using tidyr::replace_na?

I'm trying to fill all NAs in my data with 0's. Does anyone know how to do that using replace_na from tidyr? From documentation, we can easily replace NA's in different columns with different values. But how to replace all of them with some value? I…
zesla
  • 11,155
  • 16
  • 82
  • 147
75
votes
5 answers

Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

I have a data frame that looks like this: #df ID DRUG FED AUC0t Tmax Cmax 1 1 0 100 5 20 2 1 1 200 6 25 3 0 1 NA 2 30 4 0 0 150 6 65 Ans so on. I want to summarize some…
Amer
  • 2,131
  • 3
  • 23
  • 38
75
votes
8 answers

Fitting several regression models with dplyr

I would like to fit a model for each hour(the factor variable) using dplyr, I'm getting an error, and i'm not quite sure what's wrong. df.h <- data.frame( hour = factor(rep(1:24, each = 21)), price = runif(504, min = -10, max = 125), …
Thorst
  • 1,590
  • 1
  • 21
  • 35
74
votes
4 answers

Summarize all group values and a conditional subset in the same call

I'll illustrate my question with an example. Sample data: df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"), B = c(1, 5, 7, 23, 54, 202)) df ID A B 1 1 foo 1 2 1 bar 5 3 2 foo 7 4 2 foo …
kevinykuo
  • 4,600
  • 5
  • 23
  • 31
73
votes
6 answers

dplyr::select function clashes with MASS::select

If I load the MASS package: library(MASS) then load try to run dplyr::select, I get a error: library(dplyr) mtcars %.% select(mpg) # Error in select(`__prev`, mpg) : unused argument (mpg) How can I use dplyr::select with the MASS package loaded?
luciano
  • 13,158
  • 36
  • 90
  • 130
70
votes
6 answers

dplyr: how to reference columns by column index rather than column name using mutate?

Using dplyr, you can do something like this: iris %>% head %>% mutate(sum=Sepal.Length + Sepal.Width) Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum 1 5.1 3.5 1.4 0.2 setosa 8.6 2 4.9 …
Alby
  • 5,522
  • 7
  • 41
  • 51
69
votes
4 answers

Concatenate strings by group with dplyr

i have a dataframe that looks like this > data <- data.frame(foo=c(1, 1, 2, 3, 3, 3), bar=c('a', 'b', 'a', 'b', 'c', 'd')) > data foo bar 1 1 a 2 1 b 3 2 a 4 3 b 5 3 c 6 3 d I would like to create a new column bars_by_foo…
crf
  • 1,810
  • 3
  • 15
  • 23
68
votes
3 answers

How to deal with nonstandard column names (white space, punctuation, starts with numbers)

df <- structure(list(`a a` = 1:3, `a b` = 2:4), .Names = c("a a", "a b" ), row.names = c(NA, -3L), class = "data.frame") and the data looks like a a a b 1 1 2 2 2 3 3 3 4 Following call to select select(df, 'a a') gives Error in…
Flux
  • 815
  • 1
  • 6
  • 6
67
votes
3 answers

How to group by all but one columns?

How do I tell group_by to group the data by all columns except a given one? With aggregate, it would be aggregate(x ~ ., ...). I tried group_by(data, -x), but that groups by the negative-of-x (i.e. the same as grouping by x).
Roman Cheplyaka
  • 37,738
  • 7
  • 72
  • 121
67
votes
4 answers

Conditionally Count in dplyr

I have some member order data that I would like to aggregate by week of order. This is what the data looks like: memberorders=data.frame(MemID=c('A','A','B','B','B','C','C','D'), week = c(1,2,1,4,5,1,4,1), value =…
SFuj
  • 925
  • 1
  • 9
  • 14
66
votes
5 answers

standard evaluation in dplyr: summarise a variable given as a character string

UPDATE July 2020: dplyr 1.0 has changed pretty much everything about this question as well as all of the answers. See the dplyr programming vignette here: https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html The new way to refer…
Ajar
  • 1,786
  • 2
  • 15
  • 23
65
votes
2 answers

Using functions of multiple columns in a dplyr mutate_at call

I'd like to use dplyr's mutate_at function to apply a function to several columns in a dataframe, where the function inputs the column to which it is directly applied as well as another column in the dataframe. As a concrete example, I'd look to…
bschneidr
  • 6,014
  • 1
  • 37
  • 52
65
votes
11 answers

Using dplyr window functions to calculate percentiles

I have a working solution but am looking for a cleaner, more readable solution that perhaps takes advantage of some of the newer dplyr window functions. Using the mtcars dataset, if I want to look at the 25th, 50th, 75th percentiles and the mean and…
dreww2
  • 1,551
  • 3
  • 16
  • 18
65
votes
5 answers

Removing NA observations with dplyr::filter()

My data looks like this: library(tidyverse) df <- tribble( ~a, ~b, ~c, 1, 2, 3, 1, NA, 3, NA, 2, 3 ) I can remove all NA observations with drop_na(): df %>% drop_na() Or remove all NA observations in a single column (a for…
emehex
  • 9,874
  • 10
  • 54
  • 100
65
votes
7 answers

Pass arguments to dplyr functions

I want to parameterise the following computation using dplyr that finds which values of Sepal.Length are associated with more than one value of Sepal.Width: library(dplyr) iris %>% group_by(Sepal.Length) %>% …
asnr
  • 1,692
  • 1
  • 14
  • 17