1

There are times when an incremental pipeline in Palantir Foundry has to be built as a snapshot. If the data size is large, the resources to run the build are increased to reduce run time and then the configuration is removed after first snapshot run. Is there a way to set conditional configuration? Like if pipeline is running on Incremental Mode, use default configuration of resource allocation and if not the specified set of resources.

Example: If pipeline runs as snapshot transaction, below configuration has to be applied

@configure(profile=["NUM_EXECUTORS_8", "EXECUTOR_MEMORY_MEDIUM", "DRIVER_MEMORY_MEDIUM"]) 

If incremental, then the default one.

fmsf
  • 36,317
  • 49
  • 147
  • 195
DR_S
  • 77
  • 9
  • Would it be possible to share some code examples on what the expected behaviour would be from your end? – fmsf Mar 01 '22 at 19:17

1 Answers1

1

The @configure and @incremental are set during the CI execution, while the actual code inside the function annotated by @transform_df or `@transform happens at build time.

This means that you can't programatically switch between them after the CI has passed. What you can do however is have a constant or configuration within your repo, and switch at code level whenever you want to switch these. Please make sure you understand how semantic versioning works before attempting this I.e.:

IS_INCREMENTAL = true
SEMANTIC_VERSION=1

def mytransform(input1, input2,...)
   return input1.join(input2, "foo", left)


if IS_INCREMENTAL:
   @incremental(semantic_version=SEMANTIC_VERSION)
   @transform_df(
     Output("foo"),
     input1=Input("bar"),
     input2=Input("foobar"))
   def compute(input1, input2):
      return mytransform(input1, input2)
else:
   @configure(profile=["NUM_EXECUTORS_8", "EXECUTOR_MEMORY_MEDIUM", "DRIVER_MEMORY_MEDIUM"]) 
   @transform_df(
     Output("foo"),
     input1=Input("bar"),
     input2=Input("foobar"))
   def compute(input1, input2):
      return mytransform(input1, input2)
fmsf
  • 36,317
  • 49
  • 147
  • 195