Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
6
votes
2 answers

summarise returning -inf when using na.rm = TRUE

I recently built a simple R script to summarize three different data frames. Since updating to the newest version of R and R Studio, I am running into an output I haven't seen before when using the summarize function in dplyr for only one of the…
Matt Jordan
  • 567
  • 1
  • 5
  • 14
6
votes
2 answers

Code not working using map from purrr package in R

I'm learning the map function in purrr package and have the following code not working: library(purrr) library(dplyr) df1 = data.frame(type1 = c(rep('a',5),rep('b',5)), x = 1:10, y = 11:20) df1 %>% group_by(type1) %>%…
Jason
  • 1,200
  • 1
  • 10
  • 25
6
votes
2 answers

Join vectors into dataframe by matching values

I'm trying to compare multiple vectors to see where there are matching values between them. I'd like to combine the vectors into a table where every column either has the same value (for matches) or NA (for no match). For example: list1 <-…
Evan
  • 1,960
  • 4
  • 26
  • 54
6
votes
3 answers

Order data frame by the last column with dplyr

library(dplyr) df <- tibble( a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10) ) df %>% arrange(colnames(df) %>% tail(1) %>% desc()) I am looping over a list of data frames. There are different columns in the data frames and the…
H. Yong
  • 151
  • 11
6
votes
1 answer

n() acting inconsistently when used in summarise_at()

Using this example data: library(tidyverse) set.seed(123) df <- data_frame(X1 = rep(LETTERS[1:4], 6), X2 = sort(rep(1:6, 4)), ref = sample(1:50, 24), sampl1 = sample(1:50, 24), …
G_T
  • 1,555
  • 1
  • 18
  • 34
6
votes
1 answer

Using mutate_at() with negated select helpers e.g(not one_of())

I have data which looks like this: library(dplyr) set.seed(123) df <- data_frame(X1 = rep(LETTERS[1:4], 6), X2 = rep(1:2, 12), ref = sample(1:50, 24), sampl1 = sample(1:50, 24), …
G_T
  • 1,555
  • 1
  • 18
  • 34
6
votes
1 answer

how to create factor variables from quosures in functions using ggplot and dplyr?

This is a follow up from how to combine ggplot and dplyr into a function?. The issue is, how to write a function that uses dplyr, ggplot and possibly specifying factor variables from quosures? Here is an example dataframe <- data_frame(id =…
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
6
votes
3 answers

Randomly remove duplicated rows using dplyr()

As a follow-up question to this one: Remove duplicated rows using dplyr, I have the following: How do you randomly remove duplicated rows using dplyr() (among others)? My command now is: data.uniques <- distinct(data, KEYVARIABLE, .keep_all =…
6
votes
4 answers

Comparing between groups in grouped dataframe

I am trying to perform a comparison between items in subsequent groups in a dataframe - I guess this is pretty easy when you know what you are doing... My data set can be represented as follows: set.seed(1) data <- data.frame( date =…
CrustyNoodle
  • 379
  • 1
  • 14
6
votes
3 answers

Can you make dplyr::mutate and dplyr::lag default = its own input value?

This is similar to this dplyr lag post, and this dplyr mutate lag post, but neither of those ask this question about defaulting to the input value. I am using dplyr to mutate a new field that's a lagged offset of another field (that I've converted…
TheProletariat
  • 916
  • 2
  • 11
  • 23
6
votes
3 answers

Fill value backwards from occurence by group with condition

Problem: I would like to fill a value backwards from occurrence by group with a condition. I am trying to generate column C in the desired output. Set C equal to B and fill 1 backwards if A is <= 35, stop fill if A > 35. I am trying to complete…
BEMR
  • 339
  • 1
  • 3
  • 14
6
votes
1 answer

Error in subsetting with $ immediately after a function in dplyr pipe

I could subset a single column with the following syntax for functions that return data.frame or list: library(dplyr) filter(mtcars, disp > 400)$mpg # [1] 10.4 10.4 14.7 But this causes the following error when used in a pipe (%>%): mtcars %>%…
mt1022
  • 16,834
  • 5
  • 48
  • 71
6
votes
2 answers

Use dplyr coalesce in programming

I'd like to use dplyr's programming magic, new to version 0.7.0, to coalesce two columns together. Below, I've listed out a few of my attempts. df <- data_frame(x = c(1, 2, NA), y = c(2, NA, 3)) # What I want to do: mutate(df, y = coalesce(x,…
karldw
  • 361
  • 3
  • 12
6
votes
2 answers

Subset common rows from multiple data frames

I have multiple dataframes like mentioned below with unique id for each row. I am trying to find common rows and make a new dataframe which is appearing at least in two dataframes. example- row with Id=2 is appearing in all three dataframes.…
user6037598
6
votes
2 answers

How to use values from a previous row and column

I am trying to create a new variable which is a function of previous rows and columns. I have found the lag() function in dplyr but it can't accomplish exactly what I would like. library(dplyr) x = data.frame(replicate(2, sample(1:3,10,rep=TRUE))) …
Lee88
  • 1,185
  • 3
  • 15
  • 27
1 2 3
99
100