Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
6
votes
4 answers

Get last row of each group in R

I have some data similar in structure to: a <- data.frame("ID" = c("A", "A", "B", "B", "C", "C"), "NUM" = c(1, 2, 4, 3, 6, 9), "VAL" = c(1, 0, 1, 0, 1, 0)) And I am trying to sort it by ID and NUM then get the last…
Bear
  • 662
  • 1
  • 5
  • 20
6
votes
1 answer

Writing a function to use with spark_apply() from sparklyr

test <- data.frame('prod_id'= c("shoe", "shoe", "shoe", "shoe", "shoe", "shoe", "boat", "boat","boat","boat","boat","boat"), 'seller_id'= c("a", "b", "c", "d", "e", "f", "a","g", "h", "r", "q", "b"), 'Dich'= c(1, 0,…
Kreitz Gigs
  • 369
  • 1
  • 9
6
votes
5 answers

Enter value from df based on condition across multiple columns into new variable

I am sure I am not the only person who has asked this but after hours of searching with no luck I need to ask the question myself. I have a df (rp) like so: rp <- structure(list(agec1 = c(7, 16, 11, 11, 17, 17), agec2 = c(6, 12, 9,…
6
votes
2 answers

Pass a single argument as dots in tidyeval

I am trying to wrap dplyr::filter within a function where when there is more than one filter condition, then they are passed as a vector or list. See this minimal example: filter_wrap <- function(x, filter_args) { filter_args_enquos <-…
zeehio
  • 4,023
  • 2
  • 34
  • 48
6
votes
1 answer

Tilde Dot in R (~.)

Can anyone explain the tilde dot (~.) in R? I have seen some posts about it already. I know the tilde is used for formulas, specifying the independent and dependent variables. And, I know that the dot is used to indicate all other variables. More…
bg47
  • 63
  • 1
  • 4
6
votes
2 answers

Conditional running count (cumulative sum) with reset in R (dplyr)

I'm trying to calculate a running count (i.e., cumulative sum) that is conditional on other variables and that can reset for particular values on another variable. I'm working in R and would prefer a dplyr-based solution, if possible. I'd like to…
itpetersen
  • 1,475
  • 3
  • 13
  • 32
6
votes
2 answers

R: dplyr and row_number() does not enumerate as expected

I want to enumerate each record of a dataframe/tibble resulted from a grouping. The index is according a defined order. If I use row_number() it does enumerate but within group. But I want that it enumerates without considering the former…
giordano
  • 2,954
  • 7
  • 35
  • 57
6
votes
2 answers

How to combine multiple dataframe by MonthYear in R

I have below mentioned different dataframe: DF1: Origination_Date Count1 Count2 2018-07-01 147 205 2018-07-05 180 345 2018-07-08 195 247 2018-08-04 205 …
Jupiter
  • 221
  • 1
  • 12
6
votes
3 answers

How to find first element of a group that fulfill a condition

structure(list(group = c(17L, 17L, 17L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 20L, 20L, 20L, 21L, 21L, 22L, 23L, 24L, 25L, 25L, 25L, 26L, 27L, 27L, 27L, 28L), var = c(74L, 49L, 1L, 74L, 1L, 49L, 61L, 49L, 1L, 5L, 5L, 1L, 44L, 44L, 12L, 13L, 5L, 5L,…
jakes
  • 1,964
  • 3
  • 18
  • 50
6
votes
2 answers

select non-missing variables in a purrr loop

Consider this example mydata <- data_frame(ind_1 = c(NA,NA,3,4), ind_2 = c(2,3,4,5), ind_3 = c(5,6,NA,NA), y = c(28,34,25,12), group = c('a','a','b','b')) >…
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
6
votes
5 answers

summing multiple columns in an R data-frame quickly

I have a data frame like mtcars, and a string vector of column names such as c("mpg", "cyl", "disp", "hp", "drat") , and I would like to sum together all of the columns into a new one. i would normally use something like mtcars %>%…
user438383
  • 5,716
  • 8
  • 28
  • 43
6
votes
4 answers

dplyr lag with n from column values

Is it possible to use column values as n in a dplyr::lag function? Reproducible example: DF <- data.frame( V = runif(1000, min=-100, max=100), nlag = as.integer(runif(1000, min=1, max=10)) ) %>% mutate(Vlag = lag(V, n = nlag)) I get this…
Medical physicist
  • 2,510
  • 4
  • 34
  • 51
6
votes
2 answers

Multiply pairs of columns using dplyr in R

I have a dataframe with crime data and associated "prices", organized by country and year (although I don't think this is important here). Here is a subset of my data: > crime # A tibble: 8 x 8 iso year theft robbery burglary theft_price…
avs
  • 617
  • 5
  • 13
6
votes
3 answers

Find out if 2 tables (`tbl_spark`) are equal without collecting them using sparklyr

Consider there are 2 tables or table references in spark which you want to compare, e.g. to ensure that your backup worked correctly. Is there a possibility to do that remote in spark? Because it's not useful to copy all the data to R using…
nachti
  • 1,086
  • 7
  • 20
6
votes
1 answer

After doing bind_rows() and rbind() on same data.tables , identical() = FALSE?

Caveat: novice. I have several data.tables with millions of rows each, variables are mostly dates and factors. I was using rbindlist() to combine them because. Yesterday, after breaking up the tables into smaller pieces vertically (instead of the…
armipunk
  • 458
  • 2
  • 13