Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions
1
vote
2 answers

Efficient way to create a dataframe with multiple summary columns based on a grouped dataframe using dplyr in R

I have a dataframe similar to this dummy: dframe <- structure(list(id = c("294361-7349174-75411122", "294365-7645230-95464222", "291915-7345264-75464222", "291365-7345074-75164202", "594165-7345274-78444212", "234385-7335274-75464229",…
ramen
  • 691
  • 4
  • 20
1
vote
1 answer

Smooth multiple columns of dataset with summarize

I am trying to smooth a data by rounding the variable "depth" and then apply the function summarize on the given dataset. mean_safely <- possibly(.f = mean, otherwise = NA) SdesGG <- SdesGG %>% filter(., depth > 2) %>% mutate(depth = round(depth,…
C. Guff
  • 418
  • 3
  • 18
1
vote
1 answer

Group_by ID, the keep row with attribute R

Table: ID <- c("01", "01", "02", "02) Accept_Medicare <- c("Opt-out", "Accept", "Opt-Out", "Accept") Data <- c("yes", "no", "no", "no") I have a dataset with multiple of the same ID, and a column "Accept_Medicare." I want to deduplicate the data…
benzinga
  • 49
  • 4
1
vote
0 answers

How to apply function with multiple outputs on each group in R and store results in different columns?

Suppose I am using panel data: for each individual and time, there is an observation of a numerical variable. I want to apply a function to this numerical variable but this function outputs a vector of numbers. I'd like to apply this function over…
Raul Guarini Riva
  • 651
  • 1
  • 10
  • 20
1
vote
2 answers

Kusto: Self join table and get values from different rows

Working with a similar dataset as below, I am able to get the desired output by using scan operator, to fill forward strings/bools in test dataset, however it's timing out for larger datasets, as every property has many events and there are millions…
Sahil Raj
  • 107
  • 9
1
vote
1 answer

Difference between .groups argument and ungroup() in dplyr?

I'm looking at some code: df1 <- inner_join(metadata, otu_counts, by="sample_id") %>% inner_join(., taxonomy, by="otu") %>% group_by(sample_id) %>% mutate(rel_abund = count / sum(count)) %>% ungroup() %>% select(-count) This first…
Antonio
  • 417
  • 2
  • 8
1
vote
1 answer

Looking for an R function that counts number of times two columns appear together

I have a data.frame with many rows. I am trying to produce a new data.frame summarizing the total row count for all combinations of V_ID and N_ID. In the below, df1 is an example of my data and df2 is an example of the desired output. df1 <-…
E Norton
  • 83
  • 4
1
vote
2 answers

Collapse and summarize while maintaining most frequent character variable by group

I have a data frame: df <- data.frame(resource = c("gold", "gold", "gold", "silver", "silver", "gold", "silver", "bronze"), amount = c(500, 2000, 4, 8, 100, 2000, 3, 5), unit = c("g", "g", "kg", "ton", "kg", "g", "ton", "kg"), price = c(10, 10,…
Anton
  • 254
  • 1
  • 9
1
vote
1 answer

calculating count for a column of dates

I want to calculate the mean and standard deviation for the number of dates (or visits) that people have. Sample data are: id date 1 2015-02-23 1 2015-04-24 2 2018-05-23 2 2022-12-05 2 2022-12-06 3 2021-05-21 ID1 has 2 visits…
D. Fowler
  • 601
  • 3
  • 7
1
vote
0 answers

Conditionally concatenate strings in R / tidyverse

I have a dataset that is structured like this: book chapter verse text 1 1 1 string1 1 1 2 string2 1 2 1 string3 1 2 2 string4 2 1 1 string5 2 1 2 string6 2 2 1 string7 2 2 2 string8 And my intended output…
Laurin Kub
  • 11
  • 2
1
vote
1 answer

group_by() and summarise() keeping the values without grouping

I want to summarise values of two created groups and keep the values of the total sample. What I have so far: data <- structure(list(big_four = c(0L, 0L, 0L, 1L, 1L, 0L), idade_em_2022 = c(46L,38L, 40L, 23L, 27L, 27L), total_de_cooperados = c(8665L,…
RxT
  • 486
  • 7
  • 17
1
vote
1 answer

How best to calculate relative shares of different columns in R?

Below is the sample data and code. I have two issues. First, I need the indtotal column to be the sum by the twodigit code and have it stay constant as shown below. The reasons is so that I can do a simple calculation of one column divided by the…
Tim Wilcox
  • 1,275
  • 2
  • 19
  • 43
1
vote
2 answers

R dplyr summarise mean and stdev using group_by

I have a dataframe that looks like this: df <- data.frame("Experiment" = c(rep("Exp1", 6), rep("Exp2", 5), rep("Exp3", 4)), "Replicate" = c("A","A","A","B","C","C","A","A","B","B","C","A","B","B","C"), "Type" =…
Jen
  • 331
  • 2
  • 11
1
vote
1 answer

How to put the results of the summarise() function into the dataframe, using r?

This question is from (how to put the results of summarise() function into the dataframe in r) in the previous question, I think I did not convey my question well. so, I added more details. I made a minimal reproducible example, but my real data is…
yoo
  • 491
  • 3
  • 10
1
vote
1 answer

How to dissolve the dataset on multiple conditions - R

Consider dataset the following dataset: ID Start time End…