1

In an attempt to make my pandas code faster I installed modin and tried to use it. A merge of two data frames that had previously worked gave me the following error:

ValueError: can not merge DataFrame with instance of type <class 'pandas.core.frame.DataFrame'>

Here is the info of both data frames:

printing event_df.info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1980101 entries, 0 to 1980100
Data columns (total 5 columns):
other_id     object
id             object
category       object
description    object
date           datetime64[ns]
dtypes: datetime64[ns](1), object(4)
memory usage: 75.5+ MB
printing other_df info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 752438 entries, 0 to 752437
Data columns (total 4 columns):
id            752438 non-null object
other_id    752438 non-null object
Value     752438 non-null object
Unit      752438 non-null object
dtypes: object(4)
memory usage: 23.0+ MB

Here are some rows from event_df:

other_id            id     category      description        date
08E5A97350FC8B00092F 1  some_string      some_string     2019-04-09 
17B71019E148415D     4  some_string      some_string      2019-11-08 
17B71019E148415D360  7  some_string      some_string      2019-11-08 

and here are 3 rows from other_df:

id                             other_id           Value      Unit
a01  BE4F15A3AE8A508ACB45F0FC8CDC173D1628D283         3  some_string
a02  BE4F15A3AE8A508ACB45F0FC8CDC173D1628D283         3  some_string
a03  BE4F15A3AE8A508ACB45F0FC8CDC173D1628D283         3  some_string

I tried installing the version cited in this question Join two modin.pandas.DataFrame(s), but it didn't help.

Here's the line of code throwing the error:

joint_dataframe2 = pd.merge(event_df,other_df, on = ["id","other_id"])

It seems there is some problem with modin's merge functionality. Is there any workaround such as using pandas for the merge and using modin for a groupby.transform()? I tried overwriting the pandas import after the merge with import modin.pandas, but got an error saying pandas was referenced before assignment. Has anyone come across this problem and if so, is there a solution?

Boris
  • 716
  • 1
  • 4
  • 25

1 Answers1

1

Your error reads like you were merging an instance of modin.pandas.dataframe.DataFrame with an instance of pandas.core.frame.DataFrame, which is not allowed.

If that's indeed the case, you could turn the pandas Dataframe into a modin Dataframe first, then you should be able to merge them, I believe.

BGHV
  • 92
  • 4