How to split a column if a delimiter exists, else retain the same value using R

Question

My sample data frame looks like below. I need to split the column into 2 based on a period(.) delimiter if exists, else the same value needs to be retained.

df_col1    
               
server2.rty.com
datasserver
server1.rty.network
datasource

I have used the below code to split. This gives me NA for row 2 and 4 values(as there is no delimiter)

df%>%
  extract(df_col1 , into = c("First_col", "Second_col"), "^([^.]+)\\.(.*)")

My expected output is

First_col              Second_col
               
server2                rty.com
datasserver
server1                rty.network
datasource

score 2 · Answer 1 · answered Aug 16 '21 at 08:43

try separate from tidyr

tidyr::separate(data = df, col =  "df_col1",into =  c("First_col", "Second_col"), sep = "\\.", extra = "merge")

output:
    First_col  Second_col
1     server2     rty.com
2 datasserver        <NA>
3     server1 rty.network
4  datasource        <NA>

data:

df <- read.table(text = "df_col1
server2.rty.com
datasserver
server1.rty.network
datasource", header = T)

score 1 · Answer 2 · answered Aug 16 '21 at 09:55

With extract you can use this regex -

tidyr::extract(df, 'df_col1', c("First_col", "Second_col"), '(\\w+)\\.?(.*)?')

#    First_col  Second_col
#1     server2     rty.com
#2 datasserver            
#3     server1 rty.network
#4  datasource

? makes the regex optional so it still captures whatever is available.

score 1 · Answer 3 · answered Aug 16 '21 at 17:09

We could create a delimiter with sub and use read.csv from base R

read.csv(text = sub(".", ",", df$df_col1, fixed = TRUE), header = FALSE, 
     col.names = c("First_col", "Second_col"), fill = TRUE)
    First_col  Second_col
1     server2     rty.com
2 datasserver            
3     server1 rty.network
4  datasource

data

df <- structure(list(df_col1 = c("server2.rty.com", "datasserver", 
"server1.rty.network", "datasource")), class = "data.frame", row.names = c(NA, 
-4L))

score 0 · Answer 4 · answered Aug 16 '21 at 08:42

0

library(dplyr)
library(tidyr)
library(stringr)

df = data.frame(c("server2.rty.com",
"datasserver",
"server1.rty.network",
"datasource"))
names(df) = c("col")

df %>% 
  mutate(col = str_replace(col,"\\.","REPLACE")) %>% 
  separate(col,sep="REPLACE",into = c("col1","col2"))

         col1        col2
1     server2     rty.com
2 datasserver        <NA>
3     server1 rty.network
4  datasource        <NA>

Explanation : str_replace will only replace first occurrence it found from left to right. We need to change the first dot it found to something unique, i named it REPLACE string. After that use separate from tidyr to split the columns

answered Aug 16 '21 at 08:42

Vinson Ciawandy

996
11
26

dont need to replace dots. just use `extra = "merge"` arg, to *only split at most length(into) times*. moreover, what if this unique tag appears in the column? – dy_by Aug 16 '21 at 08:56
I see, i just know separate have such argument. – Vinson Ciawandy Aug 16 '21 at 09:12

How to split a column if a delimiter exists, else retain the same value using R

4 Answers4

data