Why is a pandas.series.extract(regex) able to print the correct values, but won't assign the value to an existing variable using indexing or np.where.
import pandas as pd
import numpy as np
df = pd.DataFrame(
[
['1', np.nan, np.nan, '1 Banana St, 69126 Heidelberg'],
['2', "Doloros St", 67898, '2 Choco Rd, 69412 Eberbach']],
columns=['id', "Street", 'Postcode', 'FullAddress']
)
m = df['Street'].isna()
print(df["FullAddress"].str.extract(r'(.+?),')) # prints street
print(df["FullAddress"].str.extract(r'\b(\d{5})\b')) # prints postcode
df.loc[m, 'Street'] = df.loc[m, 'FullAddress'].str.extract(r'(.+?),') # outputs NaN
df.loc[m, 'Postcode'] = df.loc[m, 'FullAddress'].str.extract(r'\b(\d{5})\b')
# trying where method throws error - NotImplementedError: cannot align with a higher dimensional NDFrame
df["Street"] = df["Street"].where(~(df["Street"].isna()), df["FullAddress"].str.extract(r'(.+?),'))
What I'm trying to do is fill the empty Street and Postcode with the values from FullAddress - without disturbing the existing Street and Postcode values.
There is no problem with the indexing, the regex, or even the extract... I've read the docs, searched for anything similar... What does every get, but I don't understand!?!?!