0

I would like to use separate with a negative look behind and keep the separator. My solution below does not keep the first capital letter of the last name.

There is an answer that does not use negative and I can't figure out how to modify it for negative look back.
How do I split a string with tidyr::separate in R and retain the values of the separator string?

tidyr::tibble(myname = c("HarlanNelson")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = "(?<!^)[[:upper:]]")
#> # A tibble: 1 × 2
#>   first  last 
#>   <chr>  <chr>
#> 1 Harlan elson

Created on 2022-10-20 by the reprex package (v2.0.1)

tidyr::tibble(myname = c("HarlanNelson", "Another Person")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = c(" ", "(?<!^)[[:upper:]]"))
#> Warning in gregexpr(pattern, x, perl = TRUE): argument 'pattern' has length > 1
#> and only the first element will be used
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1].
#> # A tibble: 2 × 2
#>   first        last  
#>   <chr>        <chr> 
#> 1 HarlanNelson <NA>  
#> 2 Another      Person

Created on 2022-10-20 by the reprex package (v2.0.1)

tidyr::tibble(myname = c("HarlanNelson", "Another Person", "someone else")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = c(" ", "(?<!^)[[:upper:]]"))
#> Warning in gregexpr(pattern, x, perl = TRUE): argument 'pattern' has length > 1
#> and only the first element will be used
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1].
#> # A tibble: 3 × 2
#>   first        last  
#>   <chr>        <chr> 
#> 1 HarlanNelson <NA>  
#> 2 Another      Person
#> 3 someone      else

Created on 2022-10-20 by the reprex package (v2.0.1)

M--
  • 25,431
  • 8
  • 61
  • 93
Harlan Nelson
  • 1,394
  • 1
  • 10
  • 22

1 Answers1

0

This is what I figured out.

But this is just developing an understanding of the answer at https://stackoverflow.com/a/51415101/4629916

from @cameron

and applying it to my problem.

tidyr::tibble(myname = c("HarlanNelson", "Another Person", "someone else")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = "(?<=[[:lower:]])(?=[[:upper:]])", extra = 'merge', fill = 'right') |> 
  tidyr::separate(col = first, into = c("first", "last2"), sep = " ", fill = 'right', extra = 'merge') |> 
  dplyr::mutate(last = dplyr::coalesce(last, last2)) |>  
  dplyr::select(-last2)
#> # A tibble: 3 × 2
#>   first   last  
#>   <chr>   <chr> 
#> 1 Harlan  Nelson
#> 2 Another Person
#> 3 someone else
tidyr::tibble(myname = c("HarlanNelson", "Another Person", "someone else")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = "(?<!^)(?=[[:upper:]])", extra = 'merge', fill = 'right') |> 
  tidyr::separate(col = first, into = c("first", "last2"), sep = " ", extra = 'merge', fill = 'right') |> 
  dplyr::mutate(last = dplyr::coalesce(last, last2)) |> 
  dplyr::select(-last2)
#> # A tibble: 3 × 2
#>   first   last  
#>   <chr>   <chr> 
#> 1 Harlan  Nelson
#> 2 Another Person
#> 3 someone else
Harlan Nelson
  • 1,394
  • 1
  • 10
  • 22