Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
89
votes
6 answers

Changing factor levels with dplyr mutate

This is probably simple and I feel stupid for asking. I want to change the levels of a factor in a data frame, using mutate. Simple example: library("dplyr") dat <- data.frame(x = factor("A"), y = 1) mutate(dat,levels(x) = "B") I get: Error:…
user3393472
  • 1,151
  • 1
  • 8
  • 12
89
votes
5 answers

How to create a lag variable within each group?

I have a data.table: require(data.table) set.seed(1) data <- data.table(time = c(1:3, 1:4), groups = c(rep(c("b", "a"), c(3, 4))), value = rnorm(7)) data # groups time value # 1: b 1…
xiaodai
  • 14,889
  • 18
  • 76
  • 140
88
votes
1 answer

Create new variables with mutate_at while keeping the original ones

Consider this simple example: library(dplyr) library(tibble) dataframe <- tibble(helloo = c(1,2,3,4,5,6), ooooHH = c(1,1,1,2,2,2), ahaaa = c(200,400,120,300,100,100)) # A tibble: 6 x 3 helloo…
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
86
votes
3 answers

dplyr::select one column and output as vector

dplyr::select results in a data.frame, is there a way to make it return a vector if the result is one column? Currently, I have to do extra step (res <- res$y) to convert it to vector from data.frame, see this example: #dummy data df <- data.frame(x…
zx8754
  • 52,746
  • 12
  • 114
  • 209
85
votes
1 answer

How to perform multiple left joins using dplyr in R

How do I join multiple dataframes in R using dplyr ? new <- left_join(x,y, by = "Flag") this is the code I am using to left join x and y the code doesn't work for multiple joins new <- left_join(x,y,z by = "Flag")
pranav
  • 1,041
  • 1
  • 10
  • 11
85
votes
4 answers

Applying group_by and summarise on data while keeping all the columns' info

I have a large dataset with 22000 rows and 25 columns. I am trying to group my dataset based on one of the columns and take the min value of the other column based on the grouped dataset. However, the problem is that it only gives me two columns…
Momeneh Foroutan
  • 915
  • 1
  • 6
  • 8
85
votes
5 answers

dplyr summarise_each with na.rm

Is there a way to instruct dplyr to use summarise_each with na.rm=TRUE? I would like to take the mean of variables with summarise_each("mean") but I don't know how to specify it to ignore missing values.
paljenczy
  • 4,779
  • 8
  • 33
  • 46
82
votes
7 answers

dplyr mutate rowSums calculations or custom functions

I'm trying to mutate a new variable from sort of row calculation, say rowSums as below iris %>% mutate_(sumVar = iris %>% select(Sepal.Length:Petal.Width) %>% rowSums) the result is that "sumVar" is…
leoluyi
  • 942
  • 1
  • 7
  • 14
82
votes
9 answers

dplyr filter: Get rows with minimum of variable, but only the first if multiple minima

I want to make a grouped filter using dplyr, in a way that within each group only that row is returned which has the minimum value of variable x. My problem is: As expected, in the case of multiple minima all rows with the minimum value are…
Felix S
  • 1,769
  • 2
  • 13
  • 17
81
votes
3 answers

Arranging rows in custom order using dplyr

With arrange function in dplyr, we can arrange row in ascending or descending order. Wonder how to arrange rows in custom order. Please see MWE. Reg <- rep(LETTERS[1:3], each = 2) Res <- rep(c("Urban", "Rural"), times = 3) set.seed(12345) Pop <-…
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
80
votes
4 answers

How to specify "does not contain" in dplyr filter

I am quite new to R. Using the table called SE_CSVLinelist_clean, I want to extract the rows where the Variable called where_case_travelled_1 DOES NOT contain the strings "Outside Canada" OR "Outside province/territory of residence but within…
ayk
  • 869
  • 2
  • 7
  • 6
80
votes
9 answers

dplyr: order columns alphabetically in R

If I have a large DF (hundreds and hundreds) columns with different col_names randomly distributed alphabetically: df.x <- data.frame(2:11, 1:10, rnorm(10)) colnames(df.x) <- c("ID", "string", "delta") How would I order all of the data…
Zach
  • 1,316
  • 2
  • 14
  • 21
79
votes
5 answers

Set certain values to NA with dplyr

I'm trying to figure out a simple way to do something like this with dplyr (data set = dat, variable = x): day$x[dat$x<0]=NA Should be simple but this is the best I can do at the moment. Is there an easier way? dat = dat %>%…
Glen
  • 1,722
  • 3
  • 18
  • 25
79
votes
2 answers

Can dplyr summarise over several variables without listing each one?

dplyr is amazingly fast, but I wonder if I'm missing something: is it possible summarise over several variables. For example: library(dplyr) library(reshape2) df <- data.frame( sex = factor(rep(c("boy", "girl"), each = 2L)), age = c(52L, 58L,…
David F
  • 1,506
  • 1
  • 12
  • 14
76
votes
3 answers

How does one stop using rowwise in dplyr?

So, if one wishes to apply an operation row by row in dplyr, one can use the rowwise function, for example: Applying a function to every row of a table using dplyr? Is there a unrowwise function which you can use to stop doing operations row by row?…
Alex
  • 15,186
  • 15
  • 73
  • 127