Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions
2
votes
2 answers

group_by and count number of elements in each column in R

I have a data table like below: city year t_20 t_25 Seattle 2019 82 91 Seattle 2018 0 103 NYC 2010 78 8 DC 2011 71 0 DC 2011 0 0 DC …
OverFlow Police
  • 861
  • 6
  • 23
2
votes
3 answers

R: How to summarize multiple variables with different functions?

I have a data frame in which for each grouping variable, there are two types of variables: one set for which I need the mean within each group, the other one for which I need the sum within each group. That is, I want to apply two different summary…
emphasent
  • 73
  • 2
  • 9
2
votes
3 answers

How to take non-missing value associated with max index for each group using summarize_all

I want to find the non-missing value of each group associated with the largest index value, for many columns. I have gotten fairly close by using summarize_all with which.max but I am not sure how to remove the NAs from each vector before I find the…
Hunter Clark
  • 181
  • 13
2
votes
1 answer

How to obtain species richness and abundance for sites with multiple samples using dplyr

Problem: I have a number of sites, with 10 sampling points at each site. Site Time Sample Species1 Species2 Species3 etc Home A 1 1 0 4 ... Home A 2 0 0 2 ... Work A 1 0 …
Grubbmeister
  • 857
  • 8
  • 25
2
votes
1 answer

Rolling weighted mean across two factor levels or time points

I would like to create a rolling 2 quarter average for alpha, bravo and charlie (and lots of other variables. Research is taking me to zoo and lubricate packages but seem to always go back to rolling within one variable or grouping…
Michael Bellhouse
  • 1,547
  • 3
  • 14
  • 26
2
votes
3 answers

Number of categories not equal to a specific one

I have a data frame with many categorical columns. I would like to count the number of distinct categories not equal to "bla". So for example: > d1 # A tibble: 5 x 2 x y 1 yellow A 2 green A 3 green bla 4 blue…
Omry Atia
  • 2,411
  • 2
  • 14
  • 27
2
votes
1 answer

plyr summarize count error row length

suppose I have the following data: A <- c(4,4,4,4,4) B <- c(1,2,3,4,4) C <- c(1,2,4,4,4) D <- c(3,2,4,1,4) filt <- c(1,1,10,8,10) data <- as.data.frame(rbind(A,B,C,D,filt)) data <- t(data) data <- as.data.frame(data) > data A B C d filt V1…
Ellie
  • 415
  • 7
  • 16
2
votes
1 answer

Combining multiple sentences into one text string in Python

I am trying to join separate sentences into one text object so that I can run it through the Gensim generator. In order for it to work, there need to be at least 2 sentences. According to my output, it looks as though I have more than two sentences…
user9608799
  • 31
  • 1
  • 8
2
votes
1 answer

Using summarize with a named vector

I am trying to use summarize, where the vector being summarized has names. The summarize function copies these names to the output, but the length is now wrong. When I try to format the resulting summary, the incorrect length of the names attribute…
2
votes
1 answer

Using variables as arguments in summarize()

I wish to pass user input variables to group_by() and summarize() functions. The direct example of the data frame and code is below. Here I am 'hard-coding' the column names. library(dplyr) df <- data.frame('Category' =…
2
votes
1 answer

dplyr: append summarise rows by threshold variable

Constraint: Using dplyr, or a tidyverse library: Objective: I'd like to summarise data using a threshold. the threshold takes many values, and append/collate these summary results. Minimal reproducible example: df <- data.frame(colA=c(1,2,1,1), …
Hedgehog
  • 5,487
  • 4
  • 36
  • 43
2
votes
2 answers
2
votes
3 answers

R: Cleaning up a wide and untidy dataframe

I have a data frame that looks like: d<-data.frame(id=(1:9), grp_id=(c(rep(1,3), rep(2,3), rep(3,3))), a=rep(NA, 9), b=c("No", rep(NA, 3), "Yes", rep(NA, 4)), …
user2230555
  • 435
  • 1
  • 3
  • 9
2
votes
2 answers

Output of the dplyr summarize() fundtion

Is there a convenient way to have dplyr::summarize_all() output the results in a more readable format without having to manually rearrange it after the fact? Ultimately, I'd like to be able to port the output of summarize more easily to tables in…
Peter Miksza
  • 347
  • 3
  • 11
2
votes
4 answers

Conditional summing across columns with dplyr

I have a data frame with four habitats sampled over eight months. Ten samples were collected from each habitat each month. The number of individuals for species in each sample was counted. The following code generates a smaller data frame of a…
Michael S Taylor
  • 425
  • 5
  • 16