I have a tbl_df
that has several columns that have multiple values in them. I am looking to use the values in the columns to create several columns. After that, I'm looking to summarize the columns.
One way I can go about it is to create several ifelse
within a mutate
but that seems inefficient. Is there a better way to go about this? I'm thinking that there is probably a dplyr
and/or tidyr
based solution.
Example of what I'm looking to do is below. It's only a sampling of the data and columns. It doesn't contain all of the columns that I'm looking to create. The summary table will have some sum
and mean
based columns.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- tibble::tribble(
~type, ~bb_type, ~description,
"B", NA, "ball",
"S", NA, "foul",
"X", "line_drive", "hit_into_play_no_out",
"S", NA, "swinging_strike",
"S", NA, "foul",
"X", "ground_ball", "hit_into_play",
"S", NA, "swinging_strike",
"X", "fly_ball", "hit_into_play_score",
"B", NA, "ball",
"S", NA, "foul"
)
df <- df %>%
mutate(ground_ball = ifelse(bb_type == "ground_ball", 1, 0),
fly_ball = if_else(bb_type == "fly_ball", 1, 0),
X = if_else(type == "X", 1, 0),
# not sure if this is the based way to go about counting columns that start with swinging to sum later
swinging_strike = grepl("^swinging", description))
df
#> # A tibble: 10 x 7
#> type bb_type description ground_ball fly_ball X swinging_strike
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl>
#> 1 B <NA> ball NA NA 0 FALSE
#> 2 S <NA> foul NA NA 0 FALSE
#> 3 X line_drive hit_into_play_no… 0 0 1 FALSE
#> 4 S <NA> swinging_strike NA NA 0 TRUE
#> 5 S <NA> foul NA NA 0 FALSE
#> 6 X ground_ba… hit_into_play 1 0 1 FALSE
#> 7 S <NA> swinging_strike NA NA 0 TRUE
#> 8 X fly_ball hit_into_play_sc… 0 1 1 FALSE
#> 9 B <NA> ball NA NA 0 FALSE
#> 10 S <NA> foul NA NA 0 FALSE
summary_df <- df %>%
summarize(n = n(),
fly_ball = sum(fly_ball, na.rm = TRUE),
ground_ball = sum(ground_ball, na.rm = TRUE))
summary_df
#> # A tibble: 1 x 3
#> n fly_ball ground_ball
#> <int> <dbl> <dbl>
#> 1 10 1 1
In summary, I'm looking to do the following:
- Create new columns for all of the values in
bb_type
andtype
that counts them - Create a new column that counts the number of values that start with swinging in the description column. I'd like to see an example that chooses another text string from that column and creates a new column with the count as an additional example. Ex. ball
- How would I choose my own name while doing what I'm looking to achieve in 1 and 2? Would I have to simply use
dplyr::rename
after the fact?