0

My sample data frame looks like below. I need to split the column into 2 based on a period(.) delimiter if exists, else the same value needs to be retained.

df_col1    
               
server2.rty.com
datasserver
server1.rty.network
datasource

I have used the below code to split. This gives me NA for row 2 and 4 values(as there is no delimiter)

df%>%
  extract(df_col1 , into = c("First_col", "Second_col"), "^([^.]+)\\.(.*)")

My expected output is

First_col              Second_col
               
server2                rty.com
datasserver
server1                rty.network
datasource

MSM
  • 69
  • 7

4 Answers4

2

try separate from tidyr

tidyr::separate(data = df, col =  "df_col1",into =  c("First_col", "Second_col"), sep = "\\.", extra = "merge")

output:
    First_col  Second_col
1     server2     rty.com
2 datasserver        <NA>
3     server1 rty.network
4  datasource        <NA>

data:

df <- read.table(text = "df_col1
server2.rty.com
datasserver
server1.rty.network
datasource", header = T)
dy_by
  • 1,061
  • 1
  • 4
  • 13
1

With extract you can use this regex -

tidyr::extract(df, 'df_col1', c("First_col", "Second_col"), '(\\w+)\\.?(.*)?')

#    First_col  Second_col
#1     server2     rty.com
#2 datasserver            
#3     server1 rty.network
#4  datasource            

? makes the regex optional so it still captures whatever is available.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We could create a delimiter with sub and use read.csv from base R

read.csv(text = sub(".", ",", df$df_col1, fixed = TRUE), header = FALSE, 
     col.names = c("First_col", "Second_col"), fill = TRUE)
    First_col  Second_col
1     server2     rty.com
2 datasserver            
3     server1 rty.network
4  datasource         

data

df <- structure(list(df_col1 = c("server2.rty.com", "datasserver", 
"server1.rty.network", "datasource")), class = "data.frame", row.names = c(NA, 
-4L))
akrun
  • 874,273
  • 37
  • 540
  • 662
0
library(dplyr)
library(tidyr)
library(stringr)

df = data.frame(c("server2.rty.com",
"datasserver",
"server1.rty.network",
"datasource"))
names(df) = c("col")

df %>% 
  mutate(col = str_replace(col,"\\.","REPLACE")) %>% 
  separate(col,sep="REPLACE",into = c("col1","col2"))
         col1        col2
1     server2     rty.com
2 datasserver        <NA>
3     server1 rty.network
4  datasource        <NA>

Explanation : str_replace will only replace first occurrence it found from left to right. We need to change the first dot it found to something unique, i named it REPLACE string. After that use separate from tidyr to split the columns

Vinson Ciawandy
  • 996
  • 11
  • 26