3

I have strings with state names in them. How do I efficiently abbreviate them? I am aware of state.abb[grep("New York", state.name)] but this works only if "New York" is the whole string. I have, for example, "Walmart, New York". Thanks in advance!

Let's assume this input:

x = c("Walmart, New York", "Hobby Lobby (California)", "Sold in Sears in Illinois")

Edit: desired outputs will be a la "Walmart, NY", "Hobby Lobby (CA)", "Sold in Sears in IL". As you can see from here, state can appear in many ways in a string

Alexey Ferapontov
  • 5,029
  • 4
  • 22
  • 39
  • What are your expectations? To get "Walmart, N.Y."? But what are the assumptions? – Wiktor Stribiżew Oct 02 '15 at 18:50
  • "Walmart, NY" as per conventions (and that is what the `state.abb` would have done if it worked with full string) – Alexey Ferapontov Oct 02 '15 at 18:51
  • 1
    My suggestion: Split your string into "first part" (just the city?) and "state part". Make a table mapping from full state names to abbreviations. Merge. Don't combine the two parts of the string again, because why would you? – Frank Oct 02 '15 at 18:54
  • @stribizhev, that won't work with Arizona, Illinois, etc. – Alexey Ferapontov Oct 02 '15 at 18:55
  • @Frank, good working suggestion. Altho not a very efficient, as apriori I do not know how exactly the state name will appear in the string. But something to start with – Alexey Ferapontov Oct 02 '15 at 18:56
  • 1
    Seems like you'll have to use an `*apply()` construct , probably with `Map()` since it's one-to-one. Something like `Map(sub, state.name, state.abb, x)` where `x` is the vector you wish to change – Rich Scriven Oct 02 '15 at 19:03
  • @RichardScriven, Yes! Altho I am horrible with any `apply` family functions. Still need to learn the syntax. How would you furnish it? – Alexey Ferapontov Oct 02 '15 at 19:06
  • Refresh for the edited comment, I think that should work. Well, not perfectly but it's a start – Rich Scriven Oct 02 '15 at 19:06

1 Answers1

6

Here's a base R way, using gregexpr(), regmatches(), and regmatches<-(), :

abbreviateStateNames <- function(x) {
    pat <- paste(state.name, collapse="|")
    m <- gregexpr(pat, x)
    ff <- function(x) state.abb[match(x, state.name)]
    regmatches(x, m) <- lapply(regmatches(x, m), ff)
    x
}

x <- c("Hobby Lobby (California)", 
       "Hello New York City, here I come (from Greensboro North Carolina)!")

abbreviateStateNames(x)
# [1] "Hobby Lobby (CA)"                                
# [2] "Hello NY City, here I come (from Greensboro NC)!"

Alternatively -- and quite a bit more naturally -- you can accomplish the same thing using the gsubfn package:

library(gsubfn)

pat <- paste(state.name, collapse="|")
gsubfn(pat, function(x) state.abb[match(x, state.name)], x)
[1] "Hobby Lobby (CA)"                                
[2] "Hello NY City, here I come (from Greensboro NC)!"
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455