35

I am trying to do a pandas merge and get the above error from the title when I try to run it. I am using 3 columns to match on whereas just before I do similar merge on only 2 columns and it works fine.

df = pd.merge(df, c, how="left",
        left_on=["section_term_ps_id", "section_school_id", "state"],
        right_on=["term_ps_id", "term_school_id", "state"])

columns for the two dataframes

df:

Index([u'section_ps_id', u'section_school_id', u'section_course_number', u'secti
on_term_ps_id', u'section_staff_ps_id', u'section_number', u'section_expression'
, u'section_grade_level', u'state', u'sections_id', u'course_ps_id', u'course_sc
hool_id', u'course_number', u'course_schd_dept', u'courses_id', u'school_ps_id',
 u'course_school_id', u'school_name', u'school_abbr', u'school_low_grade', u'sch
ool_high_grade', u'school_alt_school_number', u'school_state', u'school_phone',
u'school_fax', u'school_principal', u'school_principal_phone', u'school_principa
l_email', u'school_asst_principal', u'school_asst_principal_phone', u'school_ass
t_principal_email'], dtype='object')

c:

Index([u'term_ps_id', u'term_school_id', u'term_portion',
u'term_start_date', u' term_end_date', u'term_abbreviation',
u'term_name', u'state', u'terms_id', u'sch ool_ps_id',
u'term_school_id', u'school_name', u'school_abbr', u'school_low_grad
e', u'school_high_grade', u'school_alt_school_number',
u'school_state', u'school
_phone', u'school_fax', u'school_principal', u'school_principal_phone', u'school
_principal_email', u'school_asst_principal', u'school_asst_principal_phone', u's chool_asst_principal_email'],
dtype='object')

Is it possible to merge on three columns like this? Is there anything wrong from the merge call here?

srishtigarg
  • 1,106
  • 10
  • 24
lathomas64
  • 1,612
  • 5
  • 21
  • 47
  • 6
    You seem to have two identical columns `"term_school_id"` in you `c` dataframe... Either delete one or rename to avoid duplicate name. – Primer Nov 21 '14 at 16:24

5 Answers5

45

As mentioned in the comments, you have a dupe column:

enter image description here

JD Long
  • 59,675
  • 58
  • 202
  • 294
5

This Will remove the duplicated columns from the Dataframe

df = df[list(df.columns[~df.columns.duplicated()])]
Shivpe_R
  • 1,022
  • 2
  • 20
  • 31
3

To adress the issue of the dupe columns you can either drop the dupe column using duplicated with smth. like:

c = c[~c.columns.duplicated(keep='first')]

or adding an additional char to either one of the DataFrames using for example: c.columns=[c.columns[i]+str(i) for i in range(len(c.columns))]

Keep in mind that in this case you must adjust the merging part

2Obe
  • 3,570
  • 6
  • 30
  • 54
1

If there are no duplicate columns then:

Upgrade your pandas and make sure it’s a version above 1.1.0. There’s some problem in broadcasting values in older versions of pandas. I had the same problem but it worked well in google colab and that’s how I found its a problem with older version because colab always uses the latest version of any library.

To upgrade pandas use:

pip install --upgrade pandas
Grayrigel
  • 3,474
  • 5
  • 14
  • 32
Aman Saini
  • 19
  • 1
  • Welcome to SO! Are you aware that this question is almonst 6 years old (and has several answers, including an accepted one)? And are you sure the duplicate problem is solved by simply upgrading? – Timus Oct 29 '20 at 12:55
  • @Timus Thanks :)... I have mentioned that this may work IF there are no duplicate columns. – Aman Saini Oct 30 '20 at 10:37
  • But the dupes are the problem ... ? – Timus Oct 30 '20 at 16:18
1

I have faced similar issue, though the question is old but may help someone. We have a python code using python library 0.25 and it works fine but when the code is imported to the pod with python library 1.3.2 it starts throwing below error:-

ERROR - Error in line 34 ValueError Buffer has wrong number of dimensions (expected 1, got 2)\nTraceback (most recent call last)

Downgrading the version to 0.25 resolves the issue or upgrading the code resolves it.