0

I appreciate if someone could help me understand the code example in below question. I am now trying to implement similar thing using apache beam 2.13.0 with python3.7.3.

Why does custom Python object cannot be used with ParDo Fn?

I have understood that network sockets are not serializable since it is not objects that could return neither string nor tuple after the serialization.

What I did not understand was why do you need to call super class inside __init__ ?

class PublishFn(beam.DoFn):
    def __init__(self, topic_path):
        self.topic_path = topic_path
        super(self.__class__, self).__init__()

    def process(self, element, **kwargs):
        if not hasattr(self, 'publish'):
            from google.cloud import pubsub_v1
            self.publisher = pubsub_v1.PublisherClient()
        future = self.publisher.publish(self.topic_path, data=element.encode("utf-8"))
        return future.result()

Thanks.

Yu Watanabe
  • 621
  • 4
  • 17

2 Answers2

1

Apparently, the __init__ method of the parent class does some initialization which is required for the class instance to function properly. If you don't call this method, your code will likely break because the class won't be properly initialized.

The __init__ method of the parent class is not called automatically when you override that method in the child class (it works in the same way for other methods), so you need to call it explicitly.

ForceBru
  • 43,482
  • 10
  • 63
  • 98
1

Your class inherits from beam.DoFn. Presumably that class needs to set up some things in its __init__ method, or it won't work properly. Thus, if you override __init__, you need to call the parent class's __init__ or your instance may not function as intended.

I'd note that your current super call is actually subtly buggy. It's not appropriate to use self.__class__ as the first argument to super. You either need to write out the name of the current class explicitly, or not pass any arguments at all (the no-argument form of super is only valid in Python 3). Using self.__class__ might work for now, but it will break if you subclass PublishFn any further, and override __init__ again in the grandchild class.

Blckknght
  • 100,903
  • 11
  • 120
  • 169