I have a dataframe which I want to sort via sort_values on one column.
Problem is there are German umlaute as first letter of the words.
Like Österreich, Zürich.
Which will sort to Zürich, Österreich. It should be sorting Österreich, Zürich.
Ö should be between N and O.
I have found out how to do this with lists in python using locale and strxfrm. Can I do this in the pandas dataframe somehow directly?
Edit: Thank You. Stef example worked quite well, somehow I had Numbers where his Version did not work with my real life Dataframe example, so I used alexey's idea. I did the following, probably you can shorten this..:
df = pd.DataFrame({'location': ['Österreich','Zürich','Bern', 254345],'code':['ö','z','b', 'v']})
#create index as column for joining later
df = df.reset_index(drop=False)
#convert int to str
df['location']=df['location'].astype(str)
#sort by location with umlaute
df_sort_index = df['location'].str.normalize('NFD').sort_values(ascending=True).reset_index(drop=False)
#drop location so we dont have it in both tables
df = df.drop('location', axis=1)
#inner join on index
new_df = pd.merge(df_sort_index, df, how='inner', on='index')
#drop index as column
new_df = new_df.drop('index', axis=1)