Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions
0
votes
1 answer

R summarise by group sum giving NA

I have a data frame like this Observations: 2,190,835 Variables: 13 $ patientid 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489… $ preparationid 1000307, 1000307, 1000307,…
Lotta
  • 1
  • 1
0
votes
1 answer

group by text in columns, look for common entries in two data frames

I'm trying to compare columns from two data frames to extract items that appear in both. Specifically: df1: state group species 1 CA 2 cat, dog, chicken, mouse 2 CA 1 cat 3 NV 1 dog, chicken 4 NV 2 chicken 5 WA 1 chicken, rat, mouse, lion 6 WA 2…
nso921
  • 31
  • 3
0
votes
1 answer

Aggregate data frame by column, filtering on a different column

I want to aggregate some columns of a data frame using a factor (group in the example) but I want to use only the rows with the highest values in a different column (time in the example) df=data.frame(group=c(rep('a',5),rep('b',5)), …
nex
  • 1
  • 1
0
votes
1 answer

Decimal places not showing when using dplyr summarize function in R

I want to calculate the average number of hits by the players in the NL East in 2019 who started the most games at each position. There are 40 players in the NLeast_starters dataset and the average number of hits is 123.75, but when I run my R code…
Metsfan
  • 510
  • 2
  • 8
0
votes
0 answers

How can I summarize several timesteps in R?

I want to draw a heat map showing the operating period of a ventilation system over the year. Since my dataset has 1minute-timesteps, I need to summarize the values to hourly-timesteps (1440 values on the y-axis result in a too small resolution). So…
Annika
  • 35
  • 3
0
votes
1 answer

How can I loop different variables to the same command

I am trying to loop different variables into the same command: Following is the list of variables and values I want to loop behavior_list <- c("knocked1", "questions1", ...) answer_list <- c(0, 1) answer_label_list <- c("Yes", "No") Following is…
0
votes
1 answer

Populate a new column with the value of another column when the value of a third column

First time posting here, and new to R. I hope I do it right. How can use R to populate a new column with the value (just the value) of another column´s row conditional to the value of a third column´s row. Thank you!
0
votes
1 answer

R- Studio extracted function) works only with old col names

I have a large tibble with 300000 obs like this datetime Temp 1 47650 2000-01-01 01:00:00 -3 2 47650 2000-01-01 01:30:00 -3.1 3 47650 2000-01-01 02:00:00 -3.2 4 47650 2000-01-01…
AloesR2512
  • 11
  • 4
0
votes
0 answers

Condensing Rows Using Count

Currently working with a data set of Reddit comments all taken from Christmas Day, 2017: load('reddit_xmas_2017.RData') reddit %>% print # A tibble: 100,000 x 3 author body created_utc …
PageSim
  • 143
  • 1
  • 1
  • 8
0
votes
1 answer

Combining Rows Using Observation as Condition

Currently doing some analysis on an mpg data set that I believe exists within tidyverse. I am trying to take the large data set, and combine rows to look like the smaller one below. I have tried summarising to combine like model and years to get to…
PageSim
  • 143
  • 1
  • 1
  • 8
0
votes
1 answer

Is it possible to add an exception to summarize(count = n_distinct(x)) in R?

Is it possible to add an exception to summarize(count = n_distinct(x)) in R, while allowing the exception to be counted by the "normal" summarize(count = n()) function? How do you combine the count n() and n_distinct() functions to create a single…
Will M
  • 692
  • 9
  • 20
0
votes
1 answer

How to extract the resulted values from the matrix table to create anothe SUMX measure

What I am trying to do is to take out from the table below, just the result from Madrid (70,89%) and Barcelona (83,92%) and consolidate both results weighting them according to "total production" measure. The expected result would be the following =…
Rbn
  • 1
  • 1
0
votes
1 answer

How to describe data after multiple imputation using Amelia (which dataset should I use)?

I did multiple imputation using Amelia using the following code binary<- c("Gender", "Diabetes") exclude.from.IMPUTATION<-c( "Serial.ID") NPvars<- c("age", "HDEF","BMI")#a skewed (non-parametric variable a.out <- Amelia::amelia(x =…
Mohamed Rahouma
  • 1,084
  • 9
  • 20
0
votes
3 answers

Fast way to summarize a data frame across columns

I have this data.frame of five possible character states (genotypes): genotypes <- c("0/0","1/1","0/1","1/0","./.") library(dplyr) set.seed(1) df <- do.call(rbind, lapply(1:100, function(i) matrix(sample(genotypes, 30, replace = T), nrow = 1,…
dan
  • 6,048
  • 10
  • 57
  • 125
0
votes
1 answer

Using dplyr summarise_if() with a predicate

I want to calculate mean of x1 and x2 on days where the ratio of sum(is.NA) and all observations is >= 0.5 or else NA. Data: library(lubridate) library(dplyr) x = seq(length.out= 10) x[seq(1,11,5)] <- NA data = data.frame( tseq = seq(from =…
tRash
  • 131
  • 1
  • 11