0

I am running an apache-beam python code with a direct runner. It's failing with attribute error giving exception in the thread.

AttributeError: '_SDFBoundedSourceRestrictionTracker' object has no attribute 'checkpoint'

Find the piece of the code:

def run(argv=None):
"""Main entry point; defines and runs the barc records pipeline."""
parser = argparse.ArgumentParser()

parser.add_argument('--input',
                    type=str,
                    dest='input',
                    default='gs://{Bucket name}/Week28 - Weekly.xlsb',
                    help='Input file to process')

args, pipeline_args = parser.parse_known_args(argv)

pipeline_options = PipelineOptions(pipeline_args)

with beam.Pipeline(options=pipeline_options) as p:

    if args.input and args.week_num:

        #Read Master from BQ
        channel_master = (p | 'ReadMaster' >> beam.io.Read(beam.io.BigQuerySource(
             query = "SELECT * FROM DATASET.MASTER_TABLE"
        ))
        | "Map on name" >> beam.Map(lambda elem:(elem['name'],elem)))

        #Read name
        gc = (p | 'ReadGC' >> beam.io.Read(beam.io.BigQuerySource(
                            query = "SELECT Display_Name  FROM DEST.TABLE"))
                         | 'yieldvals' >> beam.ParDo(PrintValsDoFn())
            )

        fa_data_rows = (p
         | 'ReadFaData' >> ReadFromText(args.fa.format(args.week_num))
         | 'ConvertFaToDict' >> beam.ParDo(ConvertFAToDictFn(
             gracenoteEvent.GracenoteEventType('fa_input').get_dict_keys()
         ))
         | 'FilterWritableRows' >> beam.Filter(lambda row: str(row['FA_CODE?']).lower() == "true"
            and row['GN_ID'] != '-')
            | "Map master on channel" >> beam.Map(
             lambda x: (str(str(x['NAME'])), x)))

And writing the results to BQ.

Traceback:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 1254, in run
    self.function(*self.args, **self.kwargs)
  File "/Users/kshitijbhadage/gracenote/lib/python3.8/site-packages/apache_beam/runners/direct/sdf_direct_runner.py", line 467, in initiate_checkpoint
    checkpoint_state.residual_restriction = tracker.checkpoint()
AttributeError: '_SDFBoundedSourceRestrictionTracker' object has no attribute 'checkpoint'
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 1254, in run
    self.function(*self.args, **self.kwargs)
  File "/Users/kshitijbhadage/gracenote/lib/python3.8/site-packages/apache_beam/runners/direct/sdf_direct_runner.py", line 467, in initiate_checkpoint
    checkpoint_state.residual_restriction = tracker.checkpoint()
AttributeError: '_SDFBoundedSourceRestrictionTracker' object has no attribute 'checkpoint'

Not exactly sure why this error is coming. Tried to debug line by line still the issue persists.

Kshitij Bhadage
  • 410
  • 1
  • 4
  • 16
  • Hi there, can you please post the full stack trace? Also what version of the Python SDK are you using? – Cubez Jul 31 '20 at 20:21
  • Added traceback error. I am using python 3.7, apache-beam 2.23.0 & dataflow 2.4.0 – Kshitij Bhadage Aug 01 '20 at 06:13
  • 1
    Hi there, it looks like the `checkpoint()` method was removed in https://github.com/apache/beam/pull/9794 but the usages weren't updated in the PR. A fix was released in Apache Beam 2.24.0 in this PR https://github.com/apache/beam/pull/12192. Can you try updating the version and see if that works? – Cubez Aug 03 '20 at 21:40
  • @Cubez I updated the Apache Beam version and it's still 2.23.0 and not 2.24.0. I ran the following command: pip3 install apache-beam[gcp] --upgrade --> pip3 freeze | grep beam > apache-beam==2.23.0 – Kshitij Bhadage Aug 04 '20 at 05:24
  • 1
    Does `pip install -e git+https://github.com/apache/beam.git@c2369bd815943241b37feab528c912d58c1bbc80` work? I took the latest commit from the Apache Beam repo and followed this: https://stackoverflow.com/questions/47479613/how-do-i-force-pip-to-install-from-the-last-commit-of-a-branch-in-a-repo. This is a bug in the DirectRunner, so you should successfully run this code on other runners. – Cubez Aug 04 '20 at 17:19

0 Answers0