Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
6
votes
4 answers

Using dplyr filter() in programming

I am writing my function and want to use dplyr's filter() function to select rows of my data frame that satisfy a condition. This is my code: library(tidyverse) df <-data.frame(x = sample(1:100, 50), y = rnorm(50), z = sample(1:100,50), w =…
Kay
  • 2,057
  • 3
  • 20
  • 29
6
votes
2 answers

R dplyr summarise bug?

library(tidyverse) stats <- read_csv('stats.csv') ## Warning: Installed Rcpp (0.12.12) different from Rcpp used to build dplyr (0.12.11). ## Please reinstall dplyr to avoid random crashes or undefined behavior. I am pretty sure that I got the same…
Y.Y
  • 531
  • 4
  • 9
6
votes
2 answers

how to transform a string into a factor and sets contrasts using dplyr/magrittr piping

i have a rather specific question: how can I make a string into a factor and set its contrasts within a pipe ? Let's say that I have a tibble like the following tib <- data_frame (a = rep(c("a","b","c"),3, each = T), val = rnorm(9)) Now, I could…
Federico Nemmi
  • 167
  • 1
  • 8
6
votes
1 answer

Combining multiple columns in one R

How can I combine multiple all dataframe's columns in just 1 column? , in an efficient way... I mean not using the column names to do it, using dplyr or tidyr on R, cause I have too much columns (10.000+) For example, converting this data frame >…
Forever
  • 385
  • 3
  • 16
6
votes
4 answers

Remove columns the tidyeval way

I would like to remove a vector of columns using dplyr >= 0.7 library(dplyr) data(mtcars) rem_cols <- c("wt", "qsec", "vs", "am", "gear", "carb") head(select(mtcars, !!paste0("-", rem_cols))) Error: Strings must match column names. Unknown…
Scott
  • 161
  • 8
6
votes
1 answer

dplyr 0.5.0 mutate using column index

I've updated dplyr (now 0.7.1) and a lot of my old code does not work because mutate_each has been deprecated. I use to do something like this (code below) with mutate_each using the column index. I'd do this on hundreds of columns. And I just can't…
Kevin
  • 229
  • 3
  • 9
6
votes
2 answers

Using n() at the same time as calculating other summary statistics

I am having trouble to prepare a summary table using dplyr based on the data set below: set.seed(1) df <- data.frame(rep(sample(c(2012,2016),10, replace = T)), sample(c('Treat','Control'),10,replace = T), …
6
votes
2 answers

Need help speeding up a dplyr aggregation

tl.dr. I have an aggregation problem that I haven't seen in documentation before. I manage to get it done, but it is way too slow for the intended application. The data I usually work with have around 500 lines (my gut feeling tells me this isn't…
bdecaf
  • 4,652
  • 23
  • 44
6
votes
1 answer

Setting column names when using bind_cols (r, dplyr)

I have a data.frame (df) which contains another data.frame called url_variables. url_variables = df$url_variables url_variables contains many other data.frames such as source, campaign, page and many others. Each of these data frames has the 3…
Nick5a1
  • 917
  • 3
  • 15
  • 28
6
votes
2 answers

Looping with dplyr on each row of dataframe

I have a dataframe df <- data.frame(var1=c(10,20,30,40,50), var2=c(rep(0.3,5)), BYGROUP_OBSNUM=c(0:4)) var1 var2 BYGROUP_OBSNUM 10 0.3 0 20 0.3 1 30 0.3 2 40 0.3 3 50 0.3 4 I need to perform…
Riya
  • 181
  • 1
  • 14
6
votes
2 answers

Match in lagged group in data.table

I'm trying to create a new column that indicates if an ID was present in a previous group. Here's my data: data <- data.table(ID = c(1:3, c(9,2,3,4),c(5,1)), groups = c(rep(c("a", "b", "c"), c(3, 4,2)))) ID groups 1: 1 …
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
6
votes
2 answers

Merge two lists of dataframes

I have two big lists of dataframes that I want to merge. Here is a sample of the data. list1 = list(data.frame(Wvlgth = c(337, 337.5, 338, 338.5, 339, 339.5), Global = c(".9923+00",".01245+00", ".0005+00", ".33421E+00", ".74361+00",…
ale19
  • 1,327
  • 7
  • 23
  • 38
6
votes
2 answers

R: How to spread, group_by, summarise and mutate at the same time

I want to spread this data below (first 12 rows shown here only) by the column 'Year', returning the sum of 'Orders' grouped by 'CountryName'. Then calculate the % change in 'Orders' for each 'CountryName' from 2014 to 2015. CountryName Days …
RDJ
  • 4,052
  • 9
  • 36
  • 54
6
votes
1 answer

What is the purpose of dtplyr and the reason for the warning 'Please library(dtplyr)!'?

On loading the latest version of data.table (1.10.4) I get this message: > library(data.table) data.table…
Alex
  • 15,186
  • 15
  • 73
  • 127
6
votes
2 answers

Filter all days between a time range

I have a data frame like below: entry_no id time _________ ___ _____ 1 1 2016-09-01 09:30:09 2 2 2016-09-02 10:36:18 3 1 2016-09-01 12:27:27 4 3 …
Ricky
  • 2,662
  • 5
  • 25
  • 57
1 2 3
99
100