0

I have a long list of names of city and its province name. This is partial list of my data

data <- c('Ranchi_Capital_State_Jharkhand', 'Bokaro_State_Jharkhand', 'Tata Nagar_State_Jharkhand', 'Ramgarh_State_Jharkhand',
      'Pune_State_Maharashtra', 'Mumbai_Capital_State_Maharashtra', 'Nagpur_State_Maharashtra')

I want to arrange it such that State should come first, like this State_Jharkhand_Bokaro. If city is a capital then State_Jharkhand_Capital_Ranchi. Also note that city name or state name may have single string or more than one string (eg Tata Nagar).

What is most efficient way to do it, (without using any loop)?

Rajan
  • 453
  • 4
  • 22

2 Answers2

2

You could use the below gsub function.

> data <- c('Ranchi_Capital_State_Jharkhand', 'Bokaro_State_Jharkhand', 'Tata Nagar_State_Jharkhand', 'Ramgarh_State_Jharkhand',
+           'Pune_State_Maharashtra', 'Mumbai_Capital_State_Maharashtra', 'Nagpur_State_Maharashtra')
> gsub("^(?:(.*?)(_Capital))?(.*?)_(State.*)", "\\4\\2_\\1\\3", data)
[1] "State_Jharkhand_Capital_Ranchi"   "State_Jharkhand_Bokaro"          
[3] "State_Jharkhand_Tata Nagar"       "State_Jharkhand_Ramgarh"         
[5] "State_Maharashtra_Pune"           "State_Maharashtra_Capital_Mumbai"
[7] "State_Maharashtra_Nagpur" 

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • Thanks Avinash, but could u please modify it. Actually somewhere its Capital_1 and Somewhere Capital_2 – Rajan Apr 07 '15 at 07:00
  • post an example here along with the expected output. – Avinash Raj Apr 07 '15 at 07:01
  • suppose instead of capital there is capital_1 and Capital_2 and I want State_Jharkhand_Capital_1_Ranchi, State_Jharkhand_Capital_2_Jhumari_Tilaiya – Rajan Apr 07 '15 at 07:16
  • @Rajan, please edit your *question* to include the types of data that should be considered. Also, why do you want the output to be a single string? Why not a `list` or a `data.frame`? – A5C1D2H2I1M1N2O1R2T1 Apr 07 '15 at 07:19
  • try `gsub("^(?:(.*?)(_Capital(?:_\\d+)?))?(.*?)_(State.*)", "\\4\\2_\\1\\3", data)` – Avinash Raj Apr 07 '15 at 07:19
1

This doesn't really use much regex, but is mostly based on the expected position of the information. Split the strings by "_" and then reorder them as required:

data
# [1] "Ranchi_Capital_State_Jharkhand"   "Bokaro_State_Jharkhand"          
# [3] "Tata Nagar_State_Jharkhand"       "Ramgarh_State_Jharkhand"         
# [5] "Pune_State_Maharashtra"           "Mumbai_Capital_State_Maharashtra"
# [7] "Nagpur_State_Maharashtra"  

A <- strsplit(data, "_", TRUE)
sapply(A, function(x) {
  if (length(x) == 3) {
    paste(x[c(2, 3, 1)], collapse = "_")
  } else if (length(x) == 4) {
    paste(x[c(3, 4, 2, 1)], collapse = "_")
  } else {
    stop("unexpected length")
  }
})
# [1] "State_Jharkhand_Capital_Ranchi"   "State_Jharkhand_Bokaro"          
# [3] "State_Jharkhand_Tata Nagar"       "State_Jharkhand_Ramgarh"         
# [5] "State_Maharashtra_Pune"           "State_Maharashtra_Capital_Mumbai"
# [7] "State_Maharashtra_Nagpur"  

I don't know if using sapply breaks your requirement of "without using any loop" though.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485