Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The r dplyr package is the next iteration of the plyr package. It has three main goals:

Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.

Provide fast performance for in-memory data by writing key pieces in C++.

Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Tibbles (from tibble package)
Databases (from dbplyr package)
Introduction to dplyr
Adding a new SQL backend (from dbplyr package)
Programming with dplyr
Two-table verbs
Window functions and grouped mutate/filter

Other resources

Related tags

R's plyr, magrittr, tidyr, tidyverse and data.table packages
Python's pandas library

36044 questions

votes

2 answers

Difference between rbind() and bind_rows() in R

On the web, I found that rbind() is used to combine two data frames by rows, and the same task is performed by bind_rows() function from dplyr. What's the difference between these two functions, and which one is more efficient?

r dplyr rbind

asked Mar 19 '17 at 13:34

asad_hussain

1,959
1
17
27

votes

7 answers

case_when in mutate pipe

It seems dplyr::case_when doesn't behave as other commands in a dplyr::mutate call. For instance: library(dplyr) case_when(mtcars$carb <= 2 ~ "low", mtcars$carb > 2 ~ "high") %>% table works: . high low 15 17 But put case_when…

r dplyr

asked Jul 29 '16 at 02:16

tomw

3,114
4
29
51

votes

8 answers

R dplyr: rename variables using string functions

(Somewhat related question: Enter new column names as string in dplyr's rename function) In the middle of a dplyr chain (%>%), I would like to replace multiple column names with functions of their old names (using tolower or gsub,…

regex r rename dplyr

asked May 21 '15 at 19:39

C8H10N4O2

18,312
8
98
134

votes

3 answers

Select unique values with 'select' function in 'dplyr' library

Is it possible to select all unique values from a column of a data.frame using select function in dplyr library? Something like "SELECT DISTINCT field1 FROM table1" in SQL notation. Thanks!

r select unique dplyr

asked Aug 29 '14 at 15:33

nodm

votes

2 answers

Reorder rows using custom order

Given data: library(data.table) DT = data.table(category=LETTERS[1:3], b=1:3) DT # category b # 1: A 1 # 2: B 2 # 3: C 3 Using dplyr, how to rearrange rows to get specific order c("C", "A", "B") in category? # category…

r dplyr

asked Oct 24 '14 at 13:06

Daniel Krizian

4,586
4
38
75

votes

4 answers

"Adding missing grouping variables" message in dplyr in R

I have a portion of my script that was running fine before, but recently has been producing an odd statement after which many of my other functions do not work properly. I am trying to select the 8th and 23rd positions in a ranked list of values for…

r dplyr

asked Jul 21 '16 at 18:25

acersaccharum

votes

2 answers

mutate_each / summarise_each in dplyr: how do I select certain columns and give new names to mutated columns?

I'm a bit confused about the dplyr verb mutate_each. It's pretty straightforward to use the basic mutate to transform a column of data into, say, z-scores, and create a new column in your data.frame (here with the name z_score_data): newDF <- DF…

r dataframe dplyr

asked Nov 19 '14 at 21:29

tumultous_rooster

12,150
32
92
149

votes

3 answers

Create a ranking variable with dplyr?

Suppose I have the following data df = data.frame(name=c("A", "B", "C", "D"), score = c(10, 10, 9, 8)) I want to add a new column with the ranking. This is what I'm doing: df %>% mutate(ranking = rank(score, ties.method = 'first')) # name score…

r dplyr

asked Sep 29 '14 at 18:25

Ignacio

7,646
16
60
113

votes

2 answers

Avoiding type conflicts with dplyr::case_when

I am trying to use dplyr::case_when within dplyr::mutate to create a new variable where I set some values to missing and recode other values simultaneously. However, if I try to set values to NA, I get an error saying that we cannot create the…

r dplyr data-cleaning

asked Jul 03 '17 at 21:14

socialscientist

3,759
5
23
58

votes

3 answers

Finding percentage in a sub-group using group_by and summarise

I am new to dplyr and trying to do the following transformation without any luck. I've searched across the internet and I have found examples to do the same in ddply but I'd like to use dplyr. I have the following data: month type count 1 …

r group-by dplyr

asked Apr 09 '15 at 21:54

KC.

votes

1 answer

How to add a cumulative column to an R dataframe using dplyr?

I have the same question as this post, but I want to use dplyr: With an R dataframe, eg: df <- data.frame(id = rep(1:3, each = 5) , hour = rep(1:5, 3) , value = sample(1:15)) how do I add a cumulative sum column…

r dataframe dplyr

asked Feb 16 '14 at 23:46

Racing Tadpole

4,270
6
37
56

votes

4 answers

select columns based on multiple strings with dplyr contains()

I want to select multiple columns based on their names with a regex expression. I am trying to do it with the piping syntax of the dplyr package. I checked the other topics, but only found answers about a single string. With base R: library(dplyr) …

r regex dplyr matching multiple-matches

asked Mar 12 '15 at 19:09

agenis

8,069
5
53
102

votes

5 answers

dplyr issues when using group_by(multiple variables)

I want to start using dplyr in place of ddply but I can't get a handle on how it works (I've read the documentation). For example, why when I try to mutate() something does the "group_by" function not work as it's supposed to? Looking at…

r group-by dplyr compound-key

asked Feb 08 '14 at 23:50

Marc Tulla

1,751
2
20
34

votes

5 answers

Replace NA with previous or next value, by group, using dplyr

I have a data frame which is arranged by descending order of date. ps1 = data.frame(userID = c(21,21,21,22,22,22,23,23,23), color = c(NA,'blue','red','blue',NA,NA,'red',NA,'gold'), age =…

r dplyr missing-data zoo

asked Oct 14 '16 at 10:22

Tarak

1,035
2
8
14

votes

3 answers

dplyr: lead() and lag() wrong when used with group_by()

I want to find the lead() and lag() element in each group, but had some wrong results. For example, data is like this: library(dplyr) df = data.frame(name=rep(c('Al','Jen'),3), score=rep(c(100, 80, 60),2)) df Data: name score 1 …

r dplyr

asked Jan 30 '15 at 11:36

YJZ

3,934
11
43
67

Prev 1 2 3

…

99 100 Next