If the abbreviations are all prefixes, you can use the .startswith()
string method against either the short or long version of the state.
>>> test_value = "Flor"
>>> test_value.upper().startswith("FL")
True
>>> "Florida".lower().startswith(test_value.lower())
True
However, if you have more complex abbreviations, difflib.get_close_matches
will probably do what you want!
>>> import pandas as pd
>>> import difflib
>>> df = pd.DataFrame({"states": ("Florida", "Texas"), "st": ("FL", "TX")})
>>> df
states st
0 Florida FL
1 Texas TX
>>> difflib.get_close_matches("Flor", df["states"].to_list())
['Florida']
>>> difflib.get_close_matches("x", df["states"].to_list(), cutoff=0.2)
['Texas']
>>> df["st"][df.index[df["states"]=="Texas"]].iloc[0]
'TX'
You will probably want to try/except IndexError
around reading the first member of the returned list from difflib and possibly tweak the cutoff to get less false matches with close states (perhaps offer all the states as possibilities to some user or require more letters for close states).
You may also see the best results combining the two; testing prefixes first before trying the fuzzy match.
Putting it all together
def state_from_partial(test_text, df, col_fullnames, col_shortnames):
if len(test_text) < 2:
raise ValueError("must have at least 2 characters")
# if there's exactly two characters, try to directly match short name
if len(test_text) == 2 and test_text.upper() in df[col_shortnames]:
return test_text.upper()
states = df[col_fullnames].to_list()
match = None
# this will definitely fail at least for states starting with M or New
#for state in states:
# if state.lower().startswith(test_text.lower())
# match = state
# break # leave loop and prepare to find the prefix
if not match:
try: # see if there's a fuzzy match
match = difflib.get_close_matches(test_text, states)[0] # cutoff=0.6
except IndexError:
pass # consider matching against a list of problematic states with different cutoff
if match:
return df[col_shortnames][df.index[df[col_fullnames]==match]].iloc[0]
raise ValueError("couldn't find a state matching partial: {}".format(test_text))
Beware of states which start with 'New' or 'M' (and probably others), which are all pretty close and will probably want special handling. Testing will do wonders here.