Unfortunately, I can't share the actual data I'm working with for this question, so bear with me as I try to use a generalized example to help explain the error I'm seeing.
I have two dataframes, which we will call df_local
and df_global
, which I need to merge to get a full understanding of the data in my database. df_global
has about 16 columns, the most relevant of which are: ['observation_id', 'min', 'max']
. df_local
has 4 columns, ['observation_id', 'local_id', 'min', 'max']
. However, between the two dataframes, the observation_id
is the same but the min
and max
mean different things. In df_local
, the min
and max
are local minima and maxima, while the min
and max
in df_global
is the actual min and max of the whole data set for that observation.
When I merge the two dataframes using the following line of code:
df = pd.merge(df_global, df_local, on = 'observation_id', how = 'outer')
I get no errors and df
returns with the columns ['min_x','max_x', 'min_y','max_y']
. Which is.... fine... except that I want to rename the columns before I do the merge so that I know which one is the local and which one is the global.
HOWEVER, when I rename df_local
's columns to ['observation_id', 'local_id', 'local_min', 'local_max']
I get the following error on merge:
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
This is similar to this error here, so I checked to make sure I had no duplicate columns across either dataframe. I do not. Again, this error ONLY occurs when I try to rename df_local
's columns before doing the merge. When I do not rename the columns, I do not get the error.
I have no idea what's going on and have hunted across the internet for help and even asked the resident python gurus working with this data what might be the problem. We're all stuck.
I apologize for not being able to present actual data to show the error in action, but I hope the description is enough that someone might have a solution.
EDIT:
Here is what I can show of my script.
What works:
df = pd.merge(df_global, df_local, on = 'observation_id', how = 'outer')
What doesn't work:
df_local.columns = ['observation_id', 'local_id', 'local_min', 'local_max']
df = pd.merge(df_global, df_local, on = 'observation_id', how = 'outer')