Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
64
votes
2 answers

Difference between rbind() and bind_rows() in R

On the web, I found that rbind() is used to combine two data frames by rows, and the same task is performed by bind_rows() function from dplyr. What's the difference between these two functions, and which one is more efficient?
asad_hussain
  • 1,959
  • 1
  • 17
  • 27
64
votes
7 answers

case_when in mutate pipe

It seems dplyr::case_when doesn't behave as other commands in a dplyr::mutate call. For instance: library(dplyr) case_when(mtcars$carb <= 2 ~ "low", mtcars$carb > 2 ~ "high") %>% table works: . high low 15 17 But put case_when…
tomw
  • 3,114
  • 4
  • 29
  • 51
64
votes
8 answers

R dplyr: rename variables using string functions

(Somewhat related question: Enter new column names as string in dplyr's rename function) In the middle of a dplyr chain (%>%), I would like to replace multiple column names with functions of their old names (using tolower or gsub,…
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
64
votes
3 answers

Select unique values with 'select' function in 'dplyr' library

Is it possible to select all unique values from a column of a data.frame using select function in dplyr library? Something like "SELECT DISTINCT field1 FROM table1" in SQL notation. Thanks!
nodm
  • 793
  • 1
  • 6
  • 8
63
votes
2 answers

Reorder rows using custom order

Given data: library(data.table) DT = data.table(category=LETTERS[1:3], b=1:3) DT # category b # 1: A 1 # 2: B 2 # 3: C 3 Using dplyr, how to rearrange rows to get specific order c("C", "A", "B") in category? # category…
Daniel Krizian
  • 4,586
  • 4
  • 38
  • 75
61
votes
4 answers

"Adding missing grouping variables" message in dplyr in R

I have a portion of my script that was running fine before, but recently has been producing an odd statement after which many of my other functions do not work properly. I am trying to select the 8th and 23rd positions in a ranked list of values for…
acersaccharum
  • 665
  • 1
  • 5
  • 6
61
votes
2 answers

mutate_each / summarise_each in dplyr: how do I select certain columns and give new names to mutated columns?

I'm a bit confused about the dplyr verb mutate_each. It's pretty straightforward to use the basic mutate to transform a column of data into, say, z-scores, and create a new column in your data.frame (here with the name z_score_data): newDF <- DF…
tumultous_rooster
  • 12,150
  • 32
  • 92
  • 149
61
votes
3 answers

Create a ranking variable with dplyr?

Suppose I have the following data df = data.frame(name=c("A", "B", "C", "D"), score = c(10, 10, 9, 8)) I want to add a new column with the ranking. This is what I'm doing: df %>% mutate(ranking = rank(score, ties.method = 'first')) # name score…
Ignacio
  • 7,646
  • 16
  • 60
  • 113
60
votes
2 answers

Avoiding type conflicts with dplyr::case_when

I am trying to use dplyr::case_when within dplyr::mutate to create a new variable where I set some values to missing and recode other values simultaneously. However, if I try to set values to NA, I get an error saying that we cannot create the…
socialscientist
  • 3,759
  • 5
  • 23
  • 58
60
votes
3 answers

Finding percentage in a sub-group using group_by and summarise

I am new to dplyr and trying to do the following transformation without any luck. I've searched across the internet and I have found examples to do the same in ddply but I'd like to use dplyr. I have the following data: month type count 1 …
KC.
  • 864
  • 1
  • 10
  • 11
60
votes
1 answer

How to add a cumulative column to an R dataframe using dplyr?

I have the same question as this post, but I want to use dplyr: With an R dataframe, eg: df <- data.frame(id = rep(1:3, each = 5) , hour = rep(1:5, 3) , value = sample(1:15)) how do I add a cumulative sum column…
Racing Tadpole
  • 4,270
  • 6
  • 37
  • 56
58
votes
4 answers

select columns based on multiple strings with dplyr contains()

I want to select multiple columns based on their names with a regex expression. I am trying to do it with the piping syntax of the dplyr package. I checked the other topics, but only found answers about a single string. With base R: library(dplyr) …
agenis
  • 8,069
  • 5
  • 53
  • 102
58
votes
5 answers

dplyr issues when using group_by(multiple variables)

I want to start using dplyr in place of ddply but I can't get a handle on how it works (I've read the documentation). For example, why when I try to mutate() something does the "group_by" function not work as it's supposed to? Looking at…
Marc Tulla
  • 1,751
  • 2
  • 20
  • 34
57
votes
5 answers

Replace NA with previous or next value, by group, using dplyr

I have a data frame which is arranged by descending order of date. ps1 = data.frame(userID = c(21,21,21,22,22,22,23,23,23), color = c(NA,'blue','red','blue',NA,NA,'red',NA,'gold'), age =…
Tarak
  • 1,035
  • 2
  • 8
  • 14
57
votes
3 answers

dplyr: lead() and lag() wrong when used with group_by()

I want to find the lead() and lag() element in each group, but had some wrong results. For example, data is like this: library(dplyr) df = data.frame(name=rep(c('Al','Jen'),3), score=rep(c(100, 80, 60),2)) df Data: name score 1 …
YJZ
  • 3,934
  • 11
  • 43
  • 67