Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The r dplyr package is the next iteration of the plyr package. It has three main goals:

Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.

Provide fast performance for in-memory data by writing key pieces in C++.

Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Tibbles (from tibble package)
Databases (from dbplyr package)
Introduction to dplyr
Adding a new SQL backend (from dbplyr package)
Programming with dplyr
Two-table verbs
Window functions and grouped mutate/filter

Other resources

Related tags

R's plyr, magrittr, tidyr, tidyverse and data.table packages
Python's pandas library

36044 questions

votes

1 answer

Pipe in magrittr package is not working for function rm()

x = 10 rm(x) # removed x from the environment x = 10 x %>% rm() # Doesn't remove the variable x 1) Why doesn't pipe technique remove the variable? 2) How do I alternatively use pipe and rm() to remove a variable? Footnote: This question is…

r dplyr magrittr

asked Apr 04 '18 at 03:59

Ashrith Reddy

1,022
1
13
26

votes

6 answers

Rank most recent scores of students within a given date - 30 days window

Following is what my dataframe/data.table looks like. The rank column is my desired calculated field. library(data.table) df <- fread(' Name Score Date Rank John 42 1/1/2018 3 …

r dplyr data.table rank

asked Apr 02 '18 at 22:26

gibbz00

1,947
1
19
31

votes

3 answers

How to use `stringr` in `dplyr` pipe

I am having trouble with this code which attempts to edit some strings in a dplyr pipe. Here is some data that throws the following error. Any ideas? data_frame(id = 1:5, name = c('this and it pretty long is a', 'name…

r dplyr stringr

asked Mar 22 '18 at 20:53

elliot

1,844
16
45

votes

2 answers

How to pass by argument to dplyr join function within a function?

I would like to pass an unquoted variable name x to a left_join function. The output I expect is the same as if I ran: left_join(mtcars, mtcars, by = c('mpg' = 'mpg')) I'm trying this: ff <- function(x) { x <- enquo(x) left_join(mtcars,…

r dplyr tidyverse rlang

asked Mar 17 '18 at 20:54

Dambo

3,318
5
30
79

votes

2 answers

Creating a named vector using dplyr

I am trying to find a way to create a named vector from two columns in a data frame (one of values, one of names) using pipes. Thus far I have the following (using mtcars as example data)... library(tidyverse) x <- mtcars %>% …

r vector dplyr

asked Mar 17 '18 at 11:42

guyabel

8,014
6
57
86

votes

3 answers

Mutating dummy variables in dplyr

I want to create 7 dummy variables -one for each day, using dplyr So far, I have managed to do it using the sjmisc package and the to_dummy function, but I do it in 2 steps -1.Create a df of dummies, 2) append to the original df #Sample…

r dplyr dummy-variable

asked Mar 14 '18 at 11:42

Lefkios Paikousis

votes

1 answer

kmeans clustering in grouped data

Currently, I try to find centers of the clusters in grouped data. By using sample data set and problem definitions I am able to create kmeans cluster withing the each group. However when it comes to address each center of the cluster for given…

r machine-learning dplyr k-means

asked Mar 06 '18 at 01:34

Alexander

4,527
5
51
98

votes

1 answer

How to Create Required Matrix Using Dataframe in R

I have one dataframe which looks like: DF_1> T_id D1 D2 Num type type_2 fig xt-1 2017-05-01 2017-03-25 12:11:45 10 A X 25.20 xt-2 2017-05-01 2017-03-25 21:05:25 20 A …

r dataframe ggplot2 dplyr

asked Mar 04 '18 at 17:39

Rahul shah

votes

3 answers

dplyr / R cumulative sum with reset

I'd like to generate cumulative sums with a reset if the "current" sum exceeds some threshold, using dplyr. In the below, I want to cumsum over 'a'. library(dplyr) library(tibble) tib <- tibble( t = c(1,2,3,4,5,6), a = c(2,3,1,2,2,3) ) # what…

r dplyr

asked Mar 02 '18 at 20:22

schnee

1,050
2
9
20

votes

4 answers

r - Efficiently create variable indicating if date variable precedes event (by group)

I have two dates (date1 and date2) and an id variable in a data.frame: dat <- data.frame(c('2014-02-11', '2014-05-04', '2014-05-22'), c('2014-04-12', '2014-09-22', '2014-07-04'), c('a', 'a', 'b')) names(dat) <- c('date1', 'date2', 'id') dat$date1 <-…

r date group-by dplyr data.table

asked Mar 02 '18 at 04:52

kathystehl

votes

2 answers

dplyr::filter "No tidyselect variables were registered"

I am trying to filter specific rows of my tibble using the dplyr::filter() function. Here is part of my tibble head(raw.tb): A tibble: 738 x 4 geno ind X Y 1 san1w16 A1 467 383 2 san1w16 A1 …

r regex dplyr tidyverse tidyselect

asked Feb 04 '18 at 12:13

Al3xEP

votes

1 answer

How can I speed up spatial operations in `dplyr::mutate()`?

I am working on a spatial problem using the sf package in conjunction with dplyr and purrr. I would prefer to perform spatial operations inside a mutate call, like so: simple_feature %>% mutate(geometry_area = map_dbl(geometry, ~…

r dplyr purrr r-sf

asked Jan 31 '18 at 21:09

Tiernan

votes

1 answer

Using dplyr::group_by() to find min dates with NAs

I'm finding the minimum date within a group. Many times, the group includes only missing dates (in which case I'd prefer something like NA to be assigned). The NAs appear to be assigned correctly, but they're not responding to is.na() as I expect. …

r date dplyr na

asked Jan 26 '18 at 22:56

wibeasley

5,000
3
34
62

votes

2 answers

Get indices of common rows from two different dataframes

I have two dataframes: df1 <- data.frame(cola = c("dum1", "dum2", "dum3"), colb = c("bum1", "bum2", "bum3"), colc = c("cum1", "cum2", "cum3")) and: df2 <- data.frame(cola = c("dum1", "dum2", "dum4"), colb = c("bum1", "bum2", "bum3")) I need to…

r dplyr

asked Jan 08 '18 at 11:34

Cactus

votes

1 answer

dplyr::select_if can use colnames and their values at the same time?

I want to select cols using colnames and their values in a single pipe chain without referring other objects, such as NAMES <- names(d). Can I do it with select_if() ? For example, I can use colnames to select cols. (select(matches(...)) is…

r dplyr

asked Dec 30 '17 at 10:18

cuttlefish44

6,586
2
17
34

Prev 1 2 3

…

99 100 Next