Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The r dplyr package is the next iteration of the plyr package. It has three main goals:

Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.

Provide fast performance for in-memory data by writing key pieces in C++.

Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Tibbles (from tibble package)
Databases (from dbplyr package)
Introduction to dplyr
Adding a new SQL backend (from dbplyr package)
Programming with dplyr
Two-table verbs
Window functions and grouped mutate/filter

Other resources

Related tags

R's plyr, magrittr, tidyr, tidyverse and data.table packages
Python's pandas library

36044 questions

votes

2 answers

summarise returning -inf when using na.rm = TRUE

I recently built a simple R script to summarize three different data frames. Since updating to the newest version of R and R Studio, I am running into an output I haven't seen before when using the summarize function in dplyr for only one of the…

r dplyr summarize

asked Sep 18 '17 at 23:29

Matt Jordan

votes

2 answers

Code not working using map from purrr package in R

I'm learning the map function in purrr package and have the following code not working: library(purrr) library(dplyr) df1 = data.frame(type1 = c(rep('a',5),rep('b',5)), x = 1:10, y = 11:20) df1 %>% group_by(type1) %>%…

r dplyr purrr

asked Sep 01 '17 at 23:14

Jason

1,200
1
10
25

votes

2 answers

Join vectors into dataframe by matching values

I'm trying to compare multiple vectors to see where there are matching values between them. I'd like to combine the vectors into a table where every column either has the same value (for matches) or NA (for no match). For example: list1 <-…

r dataframe merge dplyr

asked Aug 29 '17 at 19:55

Evan

1,960
4
26
54

votes

3 answers

Order data frame by the last column with dplyr

library(dplyr) df <- tibble( a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10) ) df %>% arrange(colnames(df) %>% tail(1) %>% desc()) I am looping over a list of data frames. There are different columns in the data frames and the…

r dataframe sorting dplyr

asked Aug 29 '17 at 10:13

H. Yong

votes

1 answer

n() acting inconsistently when used in summarise_at()

Using this example data: library(tidyverse) set.seed(123) df <- data_frame(X1 = rep(LETTERS[1:4], 6), X2 = sort(rep(1:6, 4)), ref = sample(1:50, 24), sampl1 = sample(1:50, 24), …

r dplyr tidyr tidyverse purrr

asked Aug 26 '17 at 00:02

G_T

1,555
1
18
34

votes

1 answer

Using mutate_at() with negated select helpers e.g(not one_of())

I have data which looks like this: library(dplyr) set.seed(123) df <- data_frame(X1 = rep(LETTERS[1:4], 6), X2 = rep(1:2, 12), ref = sample(1:50, 24), sampl1 = sample(1:50, 24), …

r dplyr tidyverse

asked Aug 25 '17 at 06:40

G_T

1,555
1
18
34

votes

1 answer

how to create factor variables from quosures in functions using ggplot and dplyr?

This is a follow up from how to combine ggplot and dplyr into a function?. The issue is, how to write a function that uses dplyr, ggplot and possibly specifying factor variables from quosures? Here is an example dataframe <- data_frame(id =…

r ggplot2 dplyr

asked Aug 22 '17 at 19:59

ℕʘʘḆḽḘ

18,566
34
128
235

votes

3 answers

Randomly remove duplicated rows using dplyr()

As a follow-up question to this one: Remove duplicated rows using dplyr, I have the following: How do you randomly remove duplicated rows using dplyr() (among others)? My command now is: data.uniques <- distinct(data, KEYVARIABLE, .keep_all =…

r dplyr

asked Aug 21 '17 at 20:03

Sander W. van der Laan

votes

4 answers

Comparing between groups in grouped dataframe

I am trying to perform a comparison between items in subsequent groups in a dataframe - I guess this is pretty easy when you know what you are doing... My data set can be represented as follows: set.seed(1) data <- data.frame( date =…

r compare dplyr grouping sequential

asked Aug 20 '17 at 06:23

CrustyNoodle

votes

3 answers

Can you make dplyr::mutate and dplyr::lag default = its own input value?

This is similar to this dplyr lag post, and this dplyr mutate lag post, but neither of those ask this question about defaulting to the input value. I am using dplyr to mutate a new field that's a lagged offset of another field (that I've converted…

r lag dplyr

asked Aug 18 '17 at 20:23

TheProletariat

votes

3 answers

Fill value backwards from occurence by group with condition

Problem: I would like to fill a value backwards from occurrence by group with a condition. I am trying to generate column C in the desired output. Set C equal to B and fill 1 backwards if A is <= 35, stop fill if A > 35. I am trying to complete…

r dataframe dplyr

asked Aug 08 '17 at 16:13

BEMR

votes

1 answer

Error in subsetting with $ immediately after a function in dplyr pipe

I could subset a single column with the following syntax for functions that return data.frame or list: library(dplyr) filter(mtcars, disp > 400)$mpg # [1] 10.4 10.4 14.7 But this causes the following error when used in a pipe (%>%): mtcars %>%…

r dplyr

asked Aug 06 '17 at 07:39

mt1022

16,834
5
48
71

votes

2 answers

Use dplyr coalesce in programming

I'd like to use dplyr's programming magic, new to version 0.7.0, to coalesce two columns together. Below, I've listed out a few of my attempts. df <- data_frame(x = c(1, 2, NA), y = c(2, NA, 3)) # What I want to do: mutate(df, y = coalesce(x,…

r dplyr rlang

asked Jul 30 '17 at 22:49

karldw

votes

2 answers

Subset common rows from multiple data frames

I have multiple dataframes like mentioned below with unique id for each row. I am trying to find common rows and make a new dataframe which is appearing at least in two dataframes. example- row with Id=2 is appearing in all three dataframes.…

r dataframe data.table dplyr tidyr

asked Jul 28 '17 at 19:13

user6037598

votes

2 answers

How to use values from a previous row and column

I am trying to create a new variable which is a function of previous rows and columns. I have found the lag() function in dplyr but it can't accomplish exactly what I would like. library(dplyr) x = data.frame(replicate(2, sample(1:3,10,rep=TRUE))) …

r dataframe dplyr

asked Jul 25 '17 at 16:21

Lee88

1,185
3
15
27

Prev 1 2 3

…

100 Next