I have a data.frame column with +3000 strings, which I would like to have separated, but they are irregular although with a pattern. Here are some examples, and what I would like them converted into.
00700/Z14P120:xhkg
03988/Z14C3.2:xhkg
6A/F15C0.905:xcme
ADS/X14P56:xeur
AX1/X14P375:xams
BIDU/28X14C250:xcbf
ES/F15C1960:xcme
FUR/M16P8:xams
00700 | P | 120
03988 | C | 3.2
6A | C | 0.905
ADS | P | 56
AX1 | P | 375
BIDU | C | 250
ES | C | 1960
FUR | P | 8
I think this covers all possible lengths and value types for each substring.
The first new column should overwrite the input column and the two other columns should overwrite existing column blanks in the same data.frame
Another complication is that there are data.frame rows which are already formatted correctly, there is however a column which identifies the rows which are not. Below is a piece of the table as .CSV output.
Final solution: It turned out to be more difficult than anticipated to replace the values in the existing columns, due to issues with NA's, classes and registration of row numbers. Thus I ended up creating temp columns and replacing the entire column, in this fairly ugly and inefficient way. The code provided by Ananda Mahto does however work brilliantly.
ETO <- as.array(data_results$InstrumentSymbolCode)
ETO <- do.call(rbind,
strsplit(gsub("(.*)/[A-Z0-9]+?([A-Z])([0-9\\.-]+)?:.*",
"\\1NONSENSESPLIT\\2NONSENSESPLIT\\3", ETO),
"NONSENSESPLIT", fixed = TRUE))
ETO[data_results$ProductCategoryID!=9] <- ""
temp1 <- array(0,nrow(ETO))
temp2 <- array(0,nrow(ETO))
temp3 <- array(0,nrow(ETO))
for (i in 1:nrow(ETO)){
if (data_results$ProductCategoryID[i]==9) {
temp1[i] <- ETO[i,1]
temp2[i] <- ETO[i,2]
temp3[i] <- ETO[i,3]
} else {
temp1[i] <- as.character(data_results$InstrumentSymbolCode[i])
temp2[i] <- as.character(data_results$PutCall[i])
temp3[i] <- data_results$Strike[i]
}
}
data_results$InstrumentSymbolCode<-as.character(temp1)
data_results$PutCall <- temp2
data_results$Strike <- temp3