Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
6
votes
1 answer

Pipe in magrittr package is not working for function rm()

x = 10 rm(x) # removed x from the environment x = 10 x %>% rm() # Doesn't remove the variable x 1) Why doesn't pipe technique remove the variable? 2) How do I alternatively use pipe and rm() to remove a variable? Footnote: This question is…
Ashrith Reddy
  • 1,022
  • 1
  • 13
  • 26
6
votes
6 answers

Rank most recent scores of students within a given date - 30 days window

Following is what my dataframe/data.table looks like. The rank column is my desired calculated field. library(data.table) df <- fread(' Name Score Date Rank John 42 1/1/2018 3 …
gibbz00
  • 1,947
  • 1
  • 19
  • 31
6
votes
3 answers

How to use `stringr` in `dplyr` pipe

I am having trouble with this code which attempts to edit some strings in a dplyr pipe. Here is some data that throws the following error. Any ideas? data_frame(id = 1:5, name = c('this and it pretty long is a', 'name…
elliot
  • 1,844
  • 16
  • 45
6
votes
2 answers

How to pass by argument to dplyr join function within a function?

I would like to pass an unquoted variable name x to a left_join function. The output I expect is the same as if I ran: left_join(mtcars, mtcars, by = c('mpg' = 'mpg')) I'm trying this: ff <- function(x) { x <- enquo(x) left_join(mtcars,…
Dambo
  • 3,318
  • 5
  • 30
  • 79
6
votes
2 answers

Creating a named vector using dplyr

I am trying to find a way to create a named vector from two columns in a data frame (one of values, one of names) using pipes. Thus far I have the following (using mtcars as example data)... library(tidyverse) x <- mtcars %>% …
guyabel
  • 8,014
  • 6
  • 57
  • 86
6
votes
3 answers

Mutating dummy variables in dplyr

I want to create 7 dummy variables -one for each day, using dplyr So far, I have managed to do it using the sjmisc package and the to_dummy function, but I do it in 2 steps -1.Create a df of dummies, 2) append to the original df #Sample…
Lefkios Paikousis
  • 462
  • 1
  • 6
  • 12
6
votes
1 answer

kmeans clustering in grouped data

Currently, I try to find centers of the clusters in grouped data. By using sample data set and problem definitions I am able to create kmeans cluster withing the each group. However when it comes to address each center of the cluster for given…
Alexander
  • 4,527
  • 5
  • 51
  • 98
6
votes
1 answer

How to Create Required Matrix Using Dataframe in R

I have one dataframe which looks like: DF_1> T_id D1 D2 Num type type_2 fig xt-1 2017-05-01 2017-03-25 12:11:45 10 A X 25.20 xt-2 2017-05-01 2017-03-25 21:05:25 20 A …
Rahul shah
  • 185
  • 2
  • 16
6
votes
3 answers

dplyr / R cumulative sum with reset

I'd like to generate cumulative sums with a reset if the "current" sum exceeds some threshold, using dplyr. In the below, I want to cumsum over 'a'. library(dplyr) library(tibble) tib <- tibble( t = c(1,2,3,4,5,6), a = c(2,3,1,2,2,3) ) # what…
schnee
  • 1,050
  • 2
  • 9
  • 20
6
votes
4 answers

r - Efficiently create variable indicating if date variable precedes event (by group)

I have two dates (date1 and date2) and an id variable in a data.frame: dat <- data.frame(c('2014-02-11', '2014-05-04', '2014-05-22'), c('2014-04-12', '2014-09-22', '2014-07-04'), c('a', 'a', 'b')) names(dat) <- c('date1', 'date2', 'id') dat$date1 <-…
kathystehl
  • 831
  • 1
  • 9
  • 26
6
votes
2 answers

dplyr::filter "No tidyselect variables were registered"

I am trying to filter specific rows of my tibble using the dplyr::filter() function. Here is part of my tibble head(raw.tb): A tibble: 738 x 4 geno ind X Y 1 san1w16 A1 467 383 2 san1w16 A1 …
Al3xEP
  • 328
  • 2
  • 9
6
votes
1 answer

How can I speed up spatial operations in `dplyr::mutate()`?

I am working on a spatial problem using the sf package in conjunction with dplyr and purrr. I would prefer to perform spatial operations inside a mutate call, like so: simple_feature %>% mutate(geometry_area = map_dbl(geometry, ~…
Tiernan
  • 828
  • 8
  • 20
6
votes
1 answer

Using dplyr::group_by() to find min dates with NAs

I'm finding the minimum date within a group. Many times, the group includes only missing dates (in which case I'd prefer something like NA to be assigned). The NAs appear to be assigned correctly, but they're not responding to is.na() as I expect. …
wibeasley
  • 5,000
  • 3
  • 34
  • 62
6
votes
2 answers

Get indices of common rows from two different dataframes

I have two dataframes: df1 <- data.frame(cola = c("dum1", "dum2", "dum3"), colb = c("bum1", "bum2", "bum3"), colc = c("cum1", "cum2", "cum3")) and: df2 <- data.frame(cola = c("dum1", "dum2", "dum4"), colb = c("bum1", "bum2", "bum3")) I need to…
Cactus
  • 864
  • 1
  • 17
  • 44
6
votes
1 answer

dplyr::select_if can use colnames and their values at the same time?

I want to select cols using colnames and their values in a single pipe chain without referring other objects, such as NAMES <- names(d). Can I do it with select_if() ? For example, I can use colnames to select cols. (select(matches(...)) is…
cuttlefish44
  • 6,586
  • 2
  • 17
  • 34