I would like to separate a field using tidyr:separate and keep the separator and use negative look back

Question

I would like to use separate with a negative look behind and keep the separator. My solution below does not keep the first capital letter of the last name.

There is an answer that does not use negative and I can't figure out how to modify it for negative look back.
How do I split a string with tidyr::separate in R and retain the values of the separator string?

tidyr::tibble(myname = c("HarlanNelson")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = "(?<!^)[[:upper:]]")
#> # A tibble: 1 × 2
#>   first  last 
#>   <chr>  <chr>
#> 1 Harlan elson

^{Created on 2022-10-20 by the reprex package (v2.0.1)}

tidyr::tibble(myname = c("HarlanNelson", "Another Person")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = c(" ", "(?<!^)[[:upper:]]"))
#> Warning in gregexpr(pattern, x, perl = TRUE): argument 'pattern' has length > 1
#> and only the first element will be used
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1].
#> # A tibble: 2 × 2
#>   first        last  
#>   <chr>        <chr> 
#> 1 HarlanNelson <NA>  
#> 2 Another      Person

^{Created on 2022-10-20 by the reprex package (v2.0.1)}

tidyr::tibble(myname = c("HarlanNelson", "Another Person", "someone else")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = c(" ", "(?<!^)[[:upper:]]"))
#> Warning in gregexpr(pattern, x, perl = TRUE): argument 'pattern' has length > 1
#> and only the first element will be used
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1].
#> # A tibble: 3 × 2
#>   first        last  
#>   <chr>        <chr> 
#> 1 HarlanNelson <NA>  
#> 2 Another      Person
#> 3 someone      else

^{Created on 2022-10-20 by the reprex package (v2.0.1)}

score 0 · Answer 1 · answered Oct 21 '22 at 14:05

This is what I figured out.

But this is just developing an understanding of the answer at https://stackoverflow.com/a/51415101/4629916

from @cameron

and applying it to my problem.

tidyr::tibble(myname = c("HarlanNelson", "Another Person", "someone else")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = "(?<=[[:lower:]])(?=[[:upper:]])", extra = 'merge', fill = 'right') |> 
  tidyr::separate(col = first, into = c("first", "last2"), sep = " ", fill = 'right', extra = 'merge') |> 
  dplyr::mutate(last = dplyr::coalesce(last, last2)) |>  
  dplyr::select(-last2)
#> # A tibble: 3 × 2
#>   first   last  
#>   <chr>   <chr> 
#> 1 Harlan  Nelson
#> 2 Another Person
#> 3 someone else

tidyr::tibble(myname = c("HarlanNelson", "Another Person", "someone else")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = "(?<!^)(?=[[:upper:]])", extra = 'merge', fill = 'right') |> 
  tidyr::separate(col = first, into = c("first", "last2"), sep = " ", extra = 'merge', fill = 'right') |> 
  dplyr::mutate(last = dplyr::coalesce(last, last2)) |> 
  dplyr::select(-last2)
#> # A tibble: 3 × 2
#>   first   last  
#>   <chr>   <chr> 
#> 1 Harlan  Nelson
#> 2 Another Person
#> 3 someone else

I would like to separate a field using tidyr:separate and keep the separator and use negative look back

1 Answers1