1

I have a Python module consisting of multiple classes, of which the first one does an extensive data import and the following ones create files based on the output of the first class:

class ImportData:
   def __init__(self):
      self.result_csv = None
   def import_file(self):
      self.result_csv = pd.read_csv(file.csv)
      return self.result_csv 

class CreateDataObject1:
   def __init__(self):
      import_data = ImportData()
      self.data_object_1 = import_data.result_csv
   def create(self):
      self.data_object_1 = self.data_object_1.loc[self.data_object_1["Zulu"]]

class CreateDataObject2:
   def __init__(self):
      import_data = ImportData()
      self.data_object_2 = import_data.result_csv
   def create(self):
      self.data_object_2 = self.data_object_2.loc[self.data_object_2["Foxtrott"]]

As you can see, I want to pass the instance variable from the first class to all other classes so that they can use it further. However, I do not want to invoke the import method in the first class every time, because it is quite computationally expensive. How can I ensure that all following classes only take the resulting instance variables from the first class without invoking its methods? Would it be recommendable to work with staticmethod / classmethod / class inheritance here? Thanks!

29nivek
  • 35
  • 7

2 Answers2

2

Use dependency injection. Rather than instantiating ImportData and calling it inside each of the classes, provide data_object as an argument to the constructor.

The main of you application is responsible for composing all of the objects together in the correct order, and can make the determination of using new objects for each dependency or re-using them.

class ImportData:
   def __init__(self, file):
      self.result_csv = pd.read_csv(file.csv)

class CreateDataObject1:
   def __init__(self, data_object):
      self.data_object_1 = data_object.loc[data_object["Zulu"]]

class CreateDataObject2:
   def __init__(self, data_object):
      self.data_object_2 = data_object.loc[data_object["Foxtrott"]]

...

if __name__ == "__main__":
    import_data = ImportData(file)
    data_object = import_data.result_csv
    obj1 = CreateDataObject1(data_object)
    obj2 = CreateDataObject2(data_object)
flakes
  • 21,558
  • 8
  • 41
  • 88
  • Thank you, this solution worked! I guess this is also more pythonic than class inheritance? – 29nivek Mar 14 '23 at 08:43
  • @29nivek Class inheritance can still be Pythonic, but there's usually always a simple solution somewhere to be found! "There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch." - [Zen of Python](https://docs.python-guide.org/writing/style/#zen-of-python) – flakes Mar 14 '23 at 14:15
1

There are many solutions to your problem; I think one class with custom methods is a bit simpler:

class ImportData:
   def __init__(self, file):
      self.df = pd.read_csv(file)
      self.data_object1 = None
      self.data_object2 = None

   def create_object1(self):
      self.data_object1 = self.df.loc[self.df["Zulu"]]

   def create_object2(self):
      self.data_object2 = self.df.loc[self.df["Foxtrott"]]

Notice that you're basically just composing a dataframe in a class and adding a few (hardcoded) methods. Maybe you just want the dataframe and call df.loc[df["Foxtrott"]] as needed?

anon01
  • 10,618
  • 8
  • 35
  • 58
  • Thanks, that also seems like a good approach! The only thing is that I would extensive data transformations within the single methods, but I think with sufficient documentation that should not be an issue – 29nivek Mar 14 '23 at 07:57
  • if the concern is method definitions getting too long, you can also define standalone functions that are called inside the method – anon01 Mar 14 '23 at 21:09