1

I can only use stringer/ regular expression, I am working in r

I have a csv I have downloaded called mpg2,and a subset of this containing only Mercedes Benz makes. What I am trying to do is split the model into alpha and numeric so I can plot them. for example, a mercedes C300 would need to be split into C and 300, or GLS500 into GLS and 550.

so now I have all of the model numbers, now I want to split between letters and numbers.

I have tried

mercedes<- subset(mpg2, make=="Mercedes-Benz")
str_split(mercedes$model, "[0:9]") 

but this doesn't do what I want it to and I have played with n= and that doesn't work either. then I have

MB$modelnumber<-as.numeric(gsub("([0-9]+).*$", "\\1", mercedes$model))

Which makes a column of only numbers, I can't get the letters to work. If I need to upload my specific dataset let me know, I just have to figure out how to do that.

But I need to basically split "XYZ123" into its alpha and numeric parts and put them in 2 separate columns.

hk47
  • 127
  • 3
  • 14
  • It's better if you give example of real data! is the string to be split always going to be "XYZ123" format? Or is there possibility of numbers interspersed in the alphanumeric? – CHP Apr 14 '14 at 04:31
  • is there a way I can send the dataset maybe..? some examples are SL550, C300, 500SL, 380SL etc. – hk47 Apr 14 '14 at 04:43

2 Answers2

2

something like this :

x <- "XYZ123"
x <- gsub("([0-9]+)",",\\1",x)
strsplit(x,",")

i ve replaced the original group of numbers by ,group of numbers. so that i can do a split on ot easily.

aelor
  • 10,892
  • 3
  • 32
  • 48
  • okay so how would I modify the now split columns into something I can plot using ggplot? as in how do I put these split columns back into the dataset – hk47 Apr 14 '14 at 04:44
  • thats a different question overall, but i would suggest puushing the values into a 2d array. – aelor Apr 14 '14 at 04:45
  • do you want me to ask it separately..? – hk47 Apr 14 '14 at 04:47
  • @hk47, there is no need to ask it separately. Instead, a better option would be to edit your question with a sample of input and desired output so that those who are taking the time to help you answer your question are able to give the most helpful answers. – A5C1D2H2I1M1N2O1R2T1 Apr 14 '14 at 06:54
0

You can use something like this:

SplitMe <- function(string, alphaFirst = TRUE) {
  Pattern <- ifelse(isTRUE(alphaFirst), "(?<=[a-zA-Z])(?=[0-9])", "(?<=[0-9])(?=[a-zA-Z])")
  strsplit(string, split = Pattern, perl = T)
}

String <- c("C300", "GLS500", "XYZ123")
SplitMe(String)
# [[1]]
# [1] "C"   "300"
# 
# [[2]]
# [1] "GLS" "500"
# 
# [[3]]
# [1] "XYZ" "123"

To get the output as a two column matrix, just use do.call(rbind, ...):

do.call(rbind, SplitMe(String))
#      [,1]  [,2] 
# [1,] "C"   "300"
# [2,] "GLS" "500"
# [3,] "XYZ" "123"

The above is just a convenience function that I have saved for the following scenarios:

strsplit(String, split = "(?<=[a-zA-Z])(?=[0-9])", perl = T)

and

strsplit(String, split = "(?<=[0-9])(?=[a-zA-Z])", perl = T)

This function won't change a GLS500 into a GLS550 though.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485