0

I have a list of dataset as per below example

Match1
data.events.Brisbane.Broncos_Parramatta.Eels.sites.sportsbet.h2h 
data.events.Melbourne.Storm_North.Queensland.Cowboys.sites.sportsbet.h2h

I want end results to be

Team 1            Team 2
Brisbane Broncos  Parramatta Eels
Melbourne Storm   North Queensland

I tried to split and paste but not really working, please help!

ThisWee$Match1 <- unlist(strsplit(ThisWeek$Match1, "_", "\\."))
paste(ThisWeek$Match1[3], ThisWeek$Match1[4])
Jaap
  • 81,064
  • 34
  • 182
  • 193

2 Answers2

0

Two possible solutions using the data.table-package:

# load the package and convert 'mydf' to a 'data.table'
library(data.table)
setDT(mydf)

# option 1a:
mydf[, tstrsplit(Match1, '_')
     ][, .(Team1 = trimws(gsub('data|events|\\.',' ',V1)),
           Team2 = trimws(gsub('sites|sportsbet|h2h|\\.',' ',V2)))][]

# option 1b:
mydf[, setNames(tstrsplit(Match1, '_'), paste0('Team',1:2))
     ][, lapply(.SD, function(x) trimws(gsub('data|events|sites|sportsbet|h2h|\\.',' ',x)))][]

# option 2:
mydf[, tstrsplit(Match1, '_')
     ][, .(Team1 = lapply(strsplit(V1, '\\.'), function(x) paste(tail(x, -2), collapse = ' ')),
           Team2 = lapply(strsplit(V2, '\\.'), function(x) paste(head(x, -3), collapse = ' ')))][]

An alternative solution in base R:

l <- strsplit(mydf$Match1, '_')
l <- lapply(l, strsplit, split = '\\.')
l <- lapply(l, function(x) list(paste(tail(x[[1]], -2), collapse = ' '),
                                paste(head(x[[2]], -3), collapse = ' ')))
l <- lapply(l, as.data.frame.list, col.names = paste0('Team',1:2))
do.call(rbind, l)

which all give (data.table-output shown):

              Team1                    Team2
1: Brisbane Broncos          Parramatta Eels
2:  Melbourne Storm North Queensland Cowboys

Used data:

mydf <- structure(list(Match1 = c("data.events.Brisbane.Broncos_Parramatta.Eels.sites.sportsbet.h2h", 
                                  "data.events.Melbourne.Storm_North.Queensland.Cowboys.sites.sportsbet.h2h")),
                  .Names = "Match1", class = "data.frame", row.names = c(NA, -2L))
Jaap
  • 81,064
  • 34
  • 182
  • 193
0

One other solution using dplyr, tidyr, and stringr:

library(dplyr)
library(tidyr)
library(stringr)

df %>% 
  separate(Match1, c("Team1", "Team2"), "_") %>% 
  mutate_all(str_replace_all, "\\.", " ") %>% 
  mutate(Team1 = word(Team1, 3, 4),
         Team2 = word(Team2, 1, 2))
             Team1            Team2
1 Brisbane Broncos  Parramatta Eels
2  Melbourne Storm North Queensland

Data:

Lines <- "Match1
data.events.Brisbane.Broncos_Parramatta.Eels.sites.sportsbet.h2h 
data.events.Melbourne.Storm_North.Queensland.Cowboys.sites.sportsbet.h2h"

df <- read.table(text = Lines, header = TRUE, stringsAsFactors = FALSE)
tyluRp
  • 4,678
  • 2
  • 17
  • 36