0

I have a few dataframes stored inside a dict called my_dict. The keys of the dict are stored inside a list called filter_list.

filter_list = ["A", "B", "C", ...] 

my_dict[A] gives me the following result:

    links       A
0   Q11@8.jpg   1
1   Q11@11.jpg  1
2   Q11@4.2.jpg 1
3   Q11@4.3.jpg 1

my_dict[B] gives me the following result:

    links       B
0   Q11@8.jpg   1
1   A11@21.jpg  1
2   Q11@42.jpg  1
3   C11@4.jpg   1

and so on...

Now I want to merge all the dataframes together. I am using an outer-join logic since I want my final dataframe to include all possible links that are present across all dataframes inside the "links" column.

As such, I use a loop to merge them iteratively but I keep getting an error message telling me

MemoryError:

with no further info. In order to release RAM during my loop I am saving the results to a pickle file, but this doesn't seem to help either. Still I get the same error.

This is the code I am using:

for index in tqdm(range(2,len(filter_list))):
    try:
        result = pd.read_pickle("result.pkl")
    except:
        pass
    if index == 2:
        result = pd.merge(my_data[filter_list[0]], my_data[filter_list[1]], on="links", how="outer")
    result = pd.merge(result , my_data[filter_list[index]], on="links", how="outer")
    result.fillna(0, inplace=True)

    result[result.columns[1:]] = result[result.columns[1:]].astype(int)
    result.to_pickle("result.pkl")
    del result
AaronDT
  • 3,940
  • 8
  • 31
  • 71
  • won't "pd.concat([v for k,v in my_dict.items() if k in filter_list])" achieve exactly what you want – Ezer K Aug 07 '18 at 16:35

1 Answers1

1

I think what you try to achieve can be done with pd.concat:

result = (pd.concat([my_dict[key].set_index('links') for key in filter_list],
                    axis=1,sort=False)
            .fillna(0).reset_index())
result[result.columns[1:]] = result[result.columns[1:]].astype(int)

with your two dataframes A and B, it gives:

         index  A  B
0    Q11@8.jpg  1  1
1   Q11@11.jpg  1  0
2  Q11@4.2.jpg  1  0
3  Q11@4.3.jpg  1  0
4   A11@21.jpg  0  1
5   Q11@42.jpg  0  1
6    C11@4.jpg  0  1
Ben.T
  • 29,160
  • 6
  • 32
  • 54