2

Trying to figure out why I'm getting these errors. A quick search just resulted in answers that referred to a broken version, but it doesn't seem to be the case here. Creating the template works fine, but when I run it (and as I pass the limit arg) I get the error below. The idea is to build up the query based off of arguments provided in the template. If there's a better alternative to doing this, I'm open for it.

Code:

class Options(PipelineOptions):
    @classmethod
    def _add_argparse_args(cls, parser):
        parser.add_value_provider_argument(
            '--limit',
            default=0,
            type=int,
            help='Limit the amount of rows retrieved'
        )

...
    
def from_bq(options):
    with beam.Pipeline(options=options) as p:
        (p 
            | 'Read From BQ' >> beam.io.ReadFromBigQuery(query=NestedValueProvider(options.limit, create_query), use_standard_sql=True)
        ) 
    
def create_query(limit):
    query = """
        SELECT * FROM ...
    """

    if limit > 0:
        query = query + " LIMIT {limit}".format(limit=limit)

    return query

Error:

raise error.RuntimeValueProviderError('%s not accessible' % obj)
apache_beam.error.RuntimeValueProviderError: NestedValueProvider(value: RuntimeValueProvider(option: limit, type: int, default_value: 0), translator: create_query) not accessible [while running 'Read From BQ/Read/Split-ptransform-324']

Running apache-beam version 2.27.0.

  • Just to clarify, I've also tried using a StaticValueProvider, just to experiment. While it didn't work in 2.27.0, it did work in 2.26.0, so it seems that ValueProviders are semi-broken in the later version. Didn't manage to get a regular ValueProvider to work in any version though. – CaptainBarefoot Feb 04 '21 at 18:18
  • Did you find a solution? As far as I understood, the transform where you want to use the NestedValueProvider, ReadFromBigQuery in this case, needs to support it. In my case it's ReadFromMongoDB, which is also problematic. – Alessandro Calmanovici Feb 01 '23 at 11:23

1 Answers1

0

I don't think this is possible with standard templates. You should look into using Flex Templates which have the full flexibility of non-template pipelines.

robertwb
  • 4,891
  • 18
  • 21
  • What is not possible exactly? They talk about ValueProvider's in the official documentation here:https://cloud.google.com/dataflow/docs/guides/templates/creating-templates – CaptainBarefoot Feb 04 '21 at 10:23
  • Support for ValueProviders must be manually wired through every Transform (and cannot actually be inspected to at graph construction time, e.g. to influence the construction itself, even "accidentally" which is what might have happened here due to some other change). This fragility is one reason we're moving away from it. If an old SDK works, you could stick to that, but I would move to flex templates which is the future. – robertwb Feb 09 '21 at 01:50
  • I'd use Flex Templates if the startup time was not near 10 minutes when using them. Are there any solutions for this? – CaptainBarefoot Feb 10 '21 at 09:43
  • Good point about startup times. All I know is that it's being worked on. – robertwb Feb 11 '21 at 01:55