0

I am trying to analyze some daily fantasy lineups and need to split the lineup column into multiple columns, one for each position.

I would like the delimiters to be the positions ("P", "C", "1B", "2B", "SS", "3B", "OF").

I have tried to use str_split and separate but am slightly confused on how I would go about to get them into separate columns and then ordered.

Here is the column that I want to split:

Lineup
1B Justin Bour P José Berríos P Justin Verlander 2B Kiké Hernández OF Cody Bellinger OF Joc Pederson C Austin Barnes SS Corey Seager OF Corey Dickerson 3B Jung Ho Kang
P José Berríos OF Albert Almora Jr. SS Javier Báez 3B Kris Bryant 2B Ben Zobrist 1B Anthony Rizzo OF Cody Bellinger OF Joc Pederson C Austin Barnes P Eric Lauer

I would like this to look like:

P             | P                 | C               | 1B              | 2B .. and so on...
------------- | ----------------- | --------------- | --------------- |
José Berríos  | Justin Verlander  | Austin Barnes   | Justin Bour     |
José Berríos  | Eric Lauer        | Austin Barnes   | Anthony Rizzo   |
  • 3
    What's the source of this data? It's not hugely complicated to use some regex to help parse it into a usable format it but looks as if it may have been pulled from someplace where it was already structured. If that's the case it would be better to grab it from there and keep the structure intact. – Ritchie Sacramento Mar 29 '19 at 00:35

1 Answers1

2

Here is an option

pos <- c("P", "C", "1B", "2B", "3S", "3B", "OF", "SS")
pat <- sprintf("(%s)", paste(pos, collapse = "|"))

library(tidyverse)
unlist(str_split(Lineup, "\n")) %>%
    str_split(sprintf("((?<=(%s\\b))\\s|\\s(?=(%s\\b)))", pat, pat)) %>%
    map(~as_tibble(matrix(.x, ncol = 2, byrow = T)) %>%
        group_by(V1) %>%
        mutate(n = 1:n()) %>%
        unite(col, V1, n, sep = "_") %>%
        spread(col, V2)) %>%
    bind_rows()
## A tibble: 2 x 10
#  `1B_1`   `2B_1`   `3B_1`   C_1     OF_1    OF_2   OF_3   P_1    P_2     SS_1
#  <chr>    <chr>    <chr>    <chr>   <chr>   <chr>  <chr>  <chr>  <chr>   <chr>
#1 Justin … Kiké He… Jung Ho… Austin… Cody B… Joc P… Corey… José … Justin… Corey…
#2 Anthony… Ben Zob… Kris Br… Austin… Albert… Cody … Joc P… José … Eric L… Javie…

Explanation: We first define all positions (note that you forgot "SS" as a delimiter for the possible positions), and turn them into an OR regular expression in pat. We can then split the input string Lineup first on "\n" (for the different lines), and then on pat. The rest is some fairly basic tidyverse reshaping. Note that because the same positions can occur multiple times and positions should be column names as per your design, we need to "unique-ify" positions by adding a number.


Sample data

Lineup <-
"1B Justin Bour P José Berríos P Justin Verlander 2B Kiké Hernández OF Cody Bellinger OF Joc Pederson C Austin Barnes SS Corey Seager OF Corey Dickerson 3B Jung Ho Kang
P José Berríos OF Albert Almora Jr. SS Javier Báez 3B Kris Bryant 2B Ben Zobrist 1B Anthony Rizzo OF Cody Bellinger OF Joc Pederson C Austin Barnes P Eric Lauer"
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68