1

I have a column with Full names that should be separated into three columns just by spaces. The problem is that some full names contains more than three words, and 4-th and other words shouldn't be omitted, but added to third part.
For instance, "Abdullaeva Mehseti Nuraddin Kyzy" should be separated as:

| Abdullaeva | Mehseti | Nuraddin Kyzy | 

I tried to split column with (tidyr) package as follow, but in this way 3d part contains only 1 word after second space.

df<-df %>%
    separate('FULL_NAME', c("1st_part","2d_part","3d_part"), sep=" ")

Any help will be appreciated.

shA.t
  • 16,580
  • 5
  • 54
  • 111

2 Answers2

2

Use extra argument:

# dummy data
df1 <- data.frame(x = c(
  "some name1",
  "justOneName",
  "some three name",
  "Abdullaeva Mehseti Nuraddin Kyzy"))

library(tidyr)
library(dplyr)

df1 %>% 
  separate(x, c("a1", "a2", "a3"), extra = "merge")
#            a1      a2            a3
# 1        some   name1          <NA>
# 2 justOneName    <NA>          <NA>
# 3        some   three          name
# 4  Abdullaeva Mehseti Nuraddin Kyzy
# Warning message:
#   Too few values at 2 locations: 1, 2 

From manual:

extra

If sep is a character vector, this controls what happens when there are too many pieces. There are three valid options:
- "warn" (the default): emit a warning and drop extra values.
- "drop": drop any extra values without a warning.
- "merge": only splits at most length(into) times

zx8754
  • 52,746
  • 12
  • 114
  • 209
2

Since for this dataset you said that you only have name1, name2, last name, then you can also use str_split_fixed from stringr, i.e.

setNames(data.frame(stringr::str_split_fixed(df1$x, ' ', 3)), paste0('a', 1:3))

Which gives,

        a1      a2            a3
1        some   name1              
2 justOneName                      
3        some   three          name
4  Abdullaeva Mehseti Nuraddin Kyzy

Note that you can fill the empty slots with NA as per usual

Sotos
  • 51,121
  • 6
  • 32
  • 66