I have a character column that encodes a location. The location could be of the form "city, state, country" or "state, country", or "country". I'd like to split these up into three columns but to ensure that if there are less than three elements each of the values goes into the correct state or country column.
Here is my data and what I've tried:
tib <- tribble(~obs, ~location,
1, "Miami, Florida, United States",
2, "Astrakhan Oblast, Russia",
3, "Mozambique")
separate(tib, location, c("city", "state", "country"), ", ")
Result:
# A tibble: 3 × 4
obs city state country
<dbl> <chr> <chr> <chr>
1 1 Miami Florida United States
2 2 Astrakhan Oblast Russia NA
3 3 Mozambique NA NA
In some sense, I'd like to run the separate
function in reverse order so that the result looks as follows:
# A tibble: 3 × 4
obs city state country
<dbl> <chr> <chr> <chr>
1 1 Miami Florida United States
2 2 NA Astrakhan Oblast Russia
3 3 NA NA Mozambique
Update:
This is a viable option but I was hoping for something simpler:
tib %>% mutate(country = str_extract(location, "[A-Za-z ]+$"),
state = str_extract(location, "(?<=\\,)[A-Za-z ]+(?=\\,)"),
city = str_extract(location, "^[A-Za-z ]+(?=\\,)"))