State name to abbreviation

Question

I have a large file with a variable state that has full state names. I would like to replace it with the state abbreviations (that is "NY" for "New York"). Is there an easy way to do this (apart from using several if-else commands)? May be using replace() statement?

score 93 · Answer 1 · answered Mar 23 '11 at 21:54

93

R has two built-in constants that might help: state.abb with the abbreviations, and state.name with the full names. Here is a simple usage example:

> x <- c("New York", "Virginia")
> state.abb[match(x,state.name)]
[1] "NY" "VA"

answered Mar 23 '11 at 21:54

Aniko

18,516
4
48
45

1

Thanks a lot. This just saved me 30 minutes of writing 50 if-else statements in R. – user227290 Mar 23 '11 at 22:07
5

@user227290 : if you were thinking of ifelse, it might be wise to look at `?switch` Never know when it might come handy in the future. – Joris Meys Mar 24 '11 at 01:10
3

Assuming Washington DC is formatted as 'District of Columbia', I think `c(state.abb, 'DC')[match(x, c(state.name, 'District of Columbia'))]` works too – sbha Feb 01 '18 at 20:29

G. Grothendieck · Accepted Answer · 2018-10-01T17:54:25.063

44

1) grep the full name from state.name and use that to index into state.abb:

state.abb[grep("New York", state.name)]
## [1] "NY"

1a) or using which:

state.abb[which(state.name == "New York")]
## [1] "NY"

2) or create a vector of state abbreviations whose names are the full names and index into it using the full name:

setNames(state.abb, state.name)["New York"]
## New York 
##     "NY"

Unlike (1), this one works even if "New York" is replaced by a vector of full state names, e.g. setNames(state.abb, state.name)[c("New York", "Idaho")]

edited Oct 01 '18 at 17:54

answered Mar 23 '11 at 21:56

G. Grothendieck

254,981
17
203
341

Will this work when the first argument of grep is a string vector? – user227290 Mar 23 '11 at 22:09
2

No, in that case use `match` in place of `grep` as suggested by Aniko or try `setNames(state.abb, state.name)[c("New York", "Idaho")]`. – G. Grothendieck Mar 23 '11 at 22:38
Thanks, it clarifies the issue. – user227290 Mar 24 '11 at 21:09
what if there was, say, part of `state.name` such as "ill" for "illinois"? Is there a solution when the pattern is a substring of the actual `state.name`? – jvalenti Nov 27 '18 at 18:17

score 11 · Answer 3 · answered Mar 14 '18 at 16:39

Old post I know, but wanted to throw mine in there. I learned on tidyverse, so for better or worse I avoid base R when possible. I wanted one with DC too, so first I built the crosswalk:

library(tidyverse)

 st_crosswalk <- tibble(state = state.name) %>%
   bind_cols(tibble(abb = state.abb)) %>% 
   bind_rows(tibble(state = "District of Columbia", abb = "DC"))

Then I joined it to my data:

left_join(data, st_crosswalk, by = "state")

score 6 · Answer 4 · answered Jun 08 '15 at 22:25

I found the built-in state.name and state.abb have only 50 states. I got a bigger table (including DC and so on) from online (e.g., this link: http://www.infoplease.com/ipa/A0110468.html) and pasted it to a .csv file named States.csv. I then load states and abbr. from this file instead of using the built-in. The rest is quite similar to @Aniko 's

library(dplyr)
library(stringr)
library(stringdist)

setwd()
# load data
data = c("NY", "New York", "NewYork")
data = toupper(data)

# load state name and abbr.
State.data = read.csv('States.csv')
State = toupper(State.data$State)
Stateabb = as.vector(State.data$Abb)

# match data with state names, misspell of 1 letter is allowed
match = amatch(data, State, maxDist=1)
data[ !is.na(match) ] = Stateabb[ na.omit( match ) ]

There's a small difference between match and amatch in how they calculate the distance from one word to another. See P25-26 here http://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf

score 3 · Answer 5 · answered Sep 28 '22 at 03:40

Here is another way of doing it in case you have more than one state in your data and you want to replace the names with the corresponding abbreviations.

#creating a list of names 
states_df <- c("Alabama","California","Nevada","New York",
               "Oregon","Texas", "Utah","Washington")

states_df <- as.data.frame(states_df)

The output is

> print(states_df)
   states_df
1    Alabama
2 California
3     Nevada
4   New York
5     Oregon
6      Texas
7       Utah
8 Washington

Now using the state.abb function you can easily convert the names into abbreviations, and vice-versa.

states_df$state_code <- state.abb[match(states_df$states_df, state.name)]

> print(states_df)
states_df state_code
1    Alabama         AL
2 California         CA
3     Nevada         NV
4   New York         NY
5     Oregon         OR
6      Texas         TX
7       Utah         UT
8 Washington         WA

score 2 · Answer 6 · answered Aug 03 '18 at 17:48

2

You can also use base::abbreviate if you don't have US state names. This won't give you equally sized abbreviations unless you increase minlength.

state.name %>% base::abbreviate(minlength = 1)

answered Aug 03 '18 at 17:48

psychonomics

714
4
12
26

score 1 · Answer 7 · answered Mar 06 '21 at 11:23

If matching state names to abbreviations or the other way around is something you have to frequently, you could put Aniko's solution in a function in a .Rprofile or a package:

state_to_st <- function(x){
  c(state.abb, 'DC')[match(x, c(state.name, 'District of Columbia'))]
}


st_to_state <- function(x){
  c(state.name, 'District of Columbia')[match(x, c(state.abb, 'DC'))]
}

Using that function as a part of a dplyr chain:

enframe(state.name, value = 'state_name') %>% 
  mutate(state_abbr = state_to_st(state_name))

State name to abbreviation

7 Answers7

Linked