Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions
3
votes
2 answers

Computing difference between averages by group and logical values using dplyr

Does anyone know of a way to use dplyr to compute the difference between averages for some_var == TRUE and some_var == FALSE, grouped by a third variable? For example, given the following example dataframe: library('dplyr') dat <- iris %>% …
Keith Hughitt
  • 4,860
  • 5
  • 49
  • 54
3
votes
2 answers

Dplyr summarize with which.max and data with NA's

I am working with a data set of changes over time and need to calculate the time at which the peak change occurs. I am running into a problem because some subjects have missing data (NA's). Example: library(dplyr) Data <- structure(list(Subject =…
JLC
  • 661
  • 7
  • 16
3
votes
2 answers

Collapsing rows with dplyr

I am new to R and am trying to collapse rows based on row values with dplyr. The following example shows the sample data. set.seed(123) df<-data.frame(A=c(rep(1:4,4)), B=runif(16,min=0,max=1), C=rnorm(16,…
G1124E
  • 407
  • 1
  • 10
  • 20
3
votes
0 answers

What's the use of ~ in summarise_ function?

I'm using dplyr's summarise_ function and became curious about the use of ~. for example, summarise_(school, .dots = list(~ mean(PE), ~ mean(Math), ~n())) gives me the result of means of 2 variables and the number of observations. But why should I…
Mons2us
  • 192
  • 1
  • 1
  • 9
3
votes
2 answers

python pandas summarizing nominal variables (counting)

I have following data frame: KEY PROD PARAMETER Y/N 1 AAA PARAM1 Y 1 AAA PARAM2 N 1 AAA PARAM3 N 2 AAA PARAM1 N 2 AAA PARAM2 Y 2 AAA PARAM3 Y 3 CCC PARAM1 Y 3 CCC PARAM2 Y 3 CCC …
Felix
  • 1,539
  • 8
  • 20
  • 35
3
votes
1 answer

SVN diff - option ' --summarize '

Whenever I use svn diff --summarize I get something like : A *mylinkhere* M *mylinghere* What are those beginning letters (eg A, M) for?
SnuKies
  • 1,578
  • 1
  • 16
  • 37
3
votes
2 answers

Count occurrences of a string, by row, in a large data frame

I am trying to count a binary character outcome by row in a large data frame: V1 V2 V3 V4 V5 Loss Loss Loss Loss Loss Loss Loss Win Win Loss Loss Loss Loss Loss Loss Reprex: df <-…
mike
  • 123
  • 2
  • 4
2
votes
1 answer

Consolidate data in several columns to group by unique value in another column

I am hoping to consolidate data in several columns to group by unique value in another column. group_id food_1 food_2 food_3 1 1 0 0 1 0 2 0 1 0 0 6 2 2 0 0 2 0 1 0 2 0 0 5 I would like it be consolidated so it is one row for…
2
votes
5 answers

R: Using group_by for all values

I am working with the R programming language. I have the following dataset: library(dplyr) df = structure(list(ethnicity = c("c", "c", "c", "b", "c", "b", "b", "b", "c", "a", "b", "b", "a", "b", "c", "a", "c", "c", "a", "a", "a", "a", "c", "b",…
stats_noob
  • 5,401
  • 4
  • 27
  • 83
2
votes
3 answers

How to merge the values of duplicate rows into one single row

I have a dataframe like this: df <- structure(list(A = c(2, 3, 1), B = c(3, 2, 1), C = c(4, 5, 1), D = c(4, 4, 1), Genus = c("Ensifer", "Ensifer", "Ensifer" )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L)) A …
2
votes
1 answer

Is there a way to summarize values grouped by years while keeping the index?

I tried to summarize values of different years which are assigned to specific IDs. I used dplyr to summarize it but did not find a way to keep the index. My data looks something like this: year <- c(2015, 2015, 2015, 2016, 2016, 2017, 2017, 2018,…
sebmw
  • 25
  • 4
2
votes
1 answer

Reworking old code with depreciated funs() and cannot get n() to work

I have some older code that I am trying to rework since funs() has been depreciated (I know, I'm way behind!). I use the output this style of summarise_if gives often, but cannot get it to work with list(). Older Code: iris_means<-iris %>% …
user3490557
  • 744
  • 2
  • 6
  • 9
2
votes
2 answers

Improve runtime of group_by and summarize

I have a data frame df of around 10 million employees. Each employee has an ID and there is a city variable and a company variable that shows where they work: ID city company 1 NYC ABC 2 BOS ABC 1 NYC DEF 3 SEA GHI I want to group_by ID and…
questionmark
  • 335
  • 1
  • 13
2
votes
2 answers

Is there a way to remove duplicates but add the value if it appears at least once in the n selected columns?

For example imagine there is a dataset that looks like this Edit:Added Date and Num column for extra context ID|Date |Col1|Col2|Col3|Num 1 10-10 Y 5 1 10-10 Y Y 5 1 10-10 Y 5 2 09-17 Y …
user35131
  • 1,105
  • 6
  • 18
2
votes
2 answers

Summary statistics with grouping by multiple columns dataframe vs. data.table vs. dplyr

I'm working my way through the Titanic Study in Frank Harrell's R Flow course (http://hbiostat.org/rflow/case.html) and have a question about summarizing data. The raw data (Titanic5.csv) can be downloaded from…
Thomas Philips
  • 935
  • 2
  • 11
  • 22