I need to to import large datasets and merge them. I know there other questions similar to this but I could not find an answer specific to my problem. It appears that with dask
I was able to read the large datasets into a dataframe but I could not merge it with another dataframe.
import dask.dataframe as dd
import pandas as pd
#I have to do this with dask since with pandas I get mem issue and kills the python
ps = dd.read_csv('*.dsv',sep='|',low_memory=False)
mx = dd.read_csv('test.csv',sep='|',low_memory=False)
# this is where I get the error
mg = pd.merge(ps,mx,left_on='ACTIVITY_ID',right_on='WONUM')
ValueError: can not merge DataFrame with instance of type <class 'dask.dataframe.core.DataFrame'>
It is obvious that it cannot merge the dask dataframe with pandas dataframe but how else can I do this? can I use pySpark or any other methods?