3

If I create a class that imports a library and use dill to pickle it, when I unpickle it I cannot find the library:

import dill
from sklearn.metrics.cluster import adjusted_rand_score
import pandas as pd
import random
class Test1():
    def __init__(self, df):
        self.genomes = df

    @staticmethod
    def percentageSimilarityDistance(genome1, genome2):
        if len(genome1) != len(genome2):
            raise ValueError('Genome1 and genome2 must have the same length!')

        is_gene_correct = [1 if genome1[idx] == genome2[idx] else 0 for idx in range(len(genome1))]

        return (1 - sum(is_gene_correct)/(len(is_gene_correct) * 1.0))

    def createDistanceMatrix(self, distance_function):
        """Takes a dictionary of KO sets and returns a distance (or similarity) matrix which is basically how many genes do they have in common."""
        genomes_df = self.genomes.copy()
        no_of_genes, no_of_genomes = genomes_df.shape
        list_of_genome_names = list(genomes_df.columns)
        list_of_genomes = [list(genomes_df.loc[:, name]) for name in list_of_genome_names]
        distance_matrix = [[distance_function(list_of_genomes[i], list_of_genomes[j]) for j in range(no_of_genomes)] for i in range(no_of_genomes)]
        distance_matrix = pd.DataFrame(distance_matrix, columns = list_of_genomes, index = list_of_genomes)


        return distance_matrix

# create fake data
df = pd.DataFrame({'genome' + str(idx + 1): [random.randint(0, 1) for lidx in range(525)] for idx in range(10)})
test1 = Test1(df)
test2 = Test2(df)

# save pickles
with open('test1.pkl', 'wb') as pkl:
    dill.dump(test1, pkl)

I successfully unpickle the file but when I try to use one of the methods it can't find Pandas.

$ ipython
Python 3.5.4 |Anaconda custom (64-bit)| (default, Nov 20 2017, 18:44:38) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import dill

In [2]: with open('test1.pkl', 'rb') as pkl:
   ...:     test1 = dill.load(pkl)
   ...:     

In [3]: test1.createDistanceMatrix(test1.percentageSimilarityDistance)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-5918638722b1> in <module>()
----> 1 test1.createDistanceMatrix(test1.percentageSimilarityDistance)

/space/oc13378/myprojects/python/dill_tests/dill_tests.py in createDistanceMatrix(self, distance_function)
     29         return distance_matrix
     30 
---> 31 class Test2():
     32     import dill
     33     from sklearn.metrics.cluster import adjusted_rand_score

NameError: name 'pd' is not defined

Is it possible to get this to work by only importing the dill library?

ojunk
  • 879
  • 8
  • 21

1 Answers1

5

I'm the dill author. The easy thing to do is to put the import inside the function. Further, if you put the import both inside and outside your function, then you won't have a speed hit on the first call of your function.

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • I know this question is old but I'm hitting the OP's same issue. Besides doing as you described, can one simply import the required libraries in the script that is using the un-dilled object? I ask as that's what I'm attempting to do and meeting the same error that OP has met. – Greg Hilston Apr 06 '20 at 15:12
  • 2
    @GregHilston: I believe that should work, as long as `dill` looks into the global dict, and finds the libraries it expects to find. The safer thing to do is to ensure that it doesn't have to look in the global dict for a module (i.e. by using an encapsulated import). There are also different settings on the Pickler/Unpickler that enable different behavior with regard to the global dict... see `dill.settings['recurse'] = True`. – Mike McKerns Apr 07 '20 at 17:05
  • Totally same page! I stumbled upon that setting after posting my question to you and forgot to update here. I got everything working just fine. Appreciate your time and creation of `dill`. – Greg Hilston Apr 07 '20 at 21:14