Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions
2
votes
1 answer

R: dplyr conditional summarize and recode values in the column wise

I want to recode the following values within selected columns based on the summary statistics of the column (for example median value of the column). For example if cell value < median (df$variable) = 1, if cell value = median (df$variable) = 0, if…
KP1
  • 129
  • 2
  • 8
2
votes
1 answer

PowerBI Dynamic binning (ranges change) based on value of measure

I’m trying to represent some continuous data via binning. Continuous weighting data of an area should be binned as: VeryHigh, High, Low, VeryLow. The weighting values are based on an interaction between certain Types of events grouped by an Area and…
jayuu
  • 443
  • 1
  • 4
  • 17
2
votes
2 answers

Summarise multiple Columns In R Based On Top 5 Values

I am Trying To Summarise Multiple Columns Based On The Top 5 Values Of Each Variable In R An Example Of The Data Is Below. df ID A B C D A 325 68 8 8 B 308 85 2 7 B 342 99 6 2 A 439 83 9 6 A 278 60 10 2 A 367 78 …
2
votes
2 answers

Produce Bar Chart of Filtered Columns with ggplot2

Could you please tell me how I can produce the graph as shown? I want to select only the top 2 neighbourhoods for each city (top 2 neighbourhoods based on the median housing prices) and show their median prices. Of course, it is much nicer if the…
Tony Flager
  • 95
  • 1
  • 8
2
votes
1 answer

How to calculate SD by group in R, without losing columns still needed for plotting in ggplot2?

I have a dataset of 'scenario's (27x) where A, B en C have been certain input values into a model, and value is the outcome of a variable. Now I want to make a grouped barplot with ggplot (value on y, with factor B on x, fill by A. I want to make…
2
votes
1 answer

Tidyverse summarize: a new summary variable within each iteration

I do have a problem that i just can't seem so solve. Say I have a loop like the one provided by the minimal working example below. What I want R to do, is to create a new "summary" ( in this example "dogfood_items", "catfood_items", and…
minimouse
  • 131
  • 10
2
votes
2 answers

sd function returns NA when using group_by() and summarise() in dplyr (no NA values in df)

I've got a df with a binary numeric response variable (0 or 1) and several response variables. I am trying to create a table that groups by type (a 3 level variable) and step (7 levels). I want the mean response and standard deviation for each type…
MatthewQMLing
  • 35
  • 2
  • 5
2
votes
2 answers

Summarize variables beside

I am looking for a solution for my problem. I just can solve it with manually rearranging. Example code: library(dplyr) set.seed(1) Data <- data.frame( W = sample(1:10), X = sample(1:10), Y = sample(c("yes", "no"), 10,…
2
votes
3 answers

look for common entries in two data frames

df1: state group species 1 CA 2 cat, dog, chicken, mouse 2 CA 1 cat 3 NV 1 dog, chicken 4 NV 2 chicken 5 WA 1 chicken, rat, mouse, lion 6 WA 2 dog, cat 7 WA 3 dog, chicken 8 WA 4 cat, chicken df2: state special_species 1 CA cat 2 CA chicken 3 CA…
nso921
  • 31
  • 3
2
votes
1 answer

Find log difference using values based on max difference

This should be pretty simple but I'm having a hard time with it. I've created a summarised output that includes the maximum difference in values spanning 15 values (lag = 15) using the base diff() function producing the summarized column…
TheSciGuy
  • 1,154
  • 11
  • 22
2
votes
4 answers

Which is the simplest way to aggregate rows (sum) by columns values the following type of data frame on R?

index type.x type.y col3 col4 1 a m 20 25 2 b m 30 28 3 a m 15 555 3 a n 20 555 4 a m 666 10 4 b m 666 …
StivJ
  • 55
  • 5
2
votes
3 answers

Summarizing using dplyr with a for loop

I would like to summarise each of my independant variables (columns) with my target variable using dplyr over a for loop. This is my main dataframe: contract_ID Asurion Variable_1 Variable_2 Variable_3 1 Y …
2
votes
2 answers

Summarizing columns using a vector with dplyr

I want to calculate the mean of certain columns (names stored in a vector), while grouping against a column. Here is a reproducible example: Cities <- c("London","New_York") df <- data.frame(Grade = c(rep("Bad",2),rep("Average",4),rep("Good",4)), …
Ali
  • 1,048
  • 8
  • 19
2
votes
2 answers

Dplyr group_by and summarise, but keep non numeric variables

I have a dataset in a long format, where I add up values for different group. Some variables are factor variables and should be kept in the result. mtcars$model <- as.factor(rownames(mtcars)) longmtcars <- rbind(mtcars, mtcars,…
user7353167
2
votes
2 answers

How can I pull a group-based vector to pass to a function within dplyr's summarize or mutate?

I am trying to create a summary table of accuracy, sensitivity, and specificity using the AUC function within the psych package. I would like to define the input vector (t, a 4 x 1 vector) for each level of the grouped variable. What I have tried…
JLC
  • 661
  • 7
  • 16