1

Unfortunately, I can't share the actual data I'm working with for this question, so bear with me as I try to use a generalized example to help explain the error I'm seeing.

I have two dataframes, which we will call df_local and df_global, which I need to merge to get a full understanding of the data in my database. df_global has about 16 columns, the most relevant of which are: ['observation_id', 'min', 'max']. df_local has 4 columns, ['observation_id', 'local_id', 'min', 'max']. However, between the two dataframes, the observation_id is the same but the min and max mean different things. In df_local, the min and max are local minima and maxima, while the min and max in df_global is the actual min and max of the whole data set for that observation.

When I merge the two dataframes using the following line of code:

df = pd.merge(df_global, df_local, on = 'observation_id', how = 'outer')

I get no errors and df returns with the columns ['min_x','max_x', 'min_y','max_y']. Which is.... fine... except that I want to rename the columns before I do the merge so that I know which one is the local and which one is the global.

HOWEVER, when I rename df_local's columns to ['observation_id', 'local_id', 'local_min', 'local_max'] I get the following error on merge:

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

This is similar to this error here, so I checked to make sure I had no duplicate columns across either dataframe. I do not. Again, this error ONLY occurs when I try to rename df_local's columns before doing the merge. When I do not rename the columns, I do not get the error.

I have no idea what's going on and have hunted across the internet for help and even asked the resident python gurus working with this data what might be the problem. We're all stuck.

I apologize for not being able to present actual data to show the error in action, but I hope the description is enough that someone might have a solution.

EDIT:

Here is what I can show of my script.

What works:

df = pd.merge(df_global, df_local, on = 'observation_id', how = 'outer')

What doesn't work:

df_local.columns = ['observation_id', 'local_id', 'local_min', 'local_max']
df = pd.merge(df_global, df_local, on = 'observation_id', how = 'outer')
Locke
  • 11
  • 3
  • Can you post the entire script that you are using? It is impossible to debug unless you show us your work so far. – Edeki Okoh Feb 04 '19 at 18:37
  • Yes, I will show what I can. – Locke Feb 04 '19 at 18:40
  • Can you post a full traceback of your error? Without more info its hard to debug but I think it has something to do with renaming ob_id and local_id to the same name again. – Edeki Okoh Feb 04 '19 at 18:56
  • Do you get the same error using the Dataframe.rename method to rename just the columns you care about. e.g. `df_local.rename(index=str, columns={'min': 'local_min', 'max': 'local_max'})` – Oluwafemi Sule Feb 04 '19 at 19:03

0 Answers0