3

I created a library for updating description of the columns of the input dataset. This function takes three parameter as input (input_dataset, output_dataset, config file) and eventually writes back the description of output dataset. So now we want to import this library across various use cases. How to go for those cases where we are writing spark transformation i.e taking inputs through transform_df because here we can't assign output to output variable. In that situation how can i call my description library function? How to proceed in those situation in palantir foundry. Any suggestions?

Kellen Donohue
  • 777
  • 8
  • 17
Gavisha BN
  • 141
  • 1
  • 8

1 Answers1

2

This method isn't currently supported using the @transform_df decorator; you'll have to use the @transform decorator at the moment.

The reasoning behind this resulted from recognizing the need for broader access to metadata APIs like the @transform decorator already allows. Thus it seemed more in line with this pattern to keep it there since the @transform_df decorator is inherently higher-level.

You can always simply move over your transformations from...

from transforms.api import transform_df, Input, Output


@transform_df(
  Output("/my/output"),
  my_input("/my/input"),
)
def my_compute_function(my_input):
  df = my_input
  # ... logic ....
  return my_input

...to...

from transforms.api import transform, Input, Output


@transform(
  my_output=Output("/my/output"),
  my_input=Input("/my/input")
)
def my_compute_function(my_input, my_output):
  df = my_input.dataframe()
  # ... logic ....
  my_output.write_dataframe(df)

...in which only 6 lines of code need be changed.

vanhooser
  • 1,497
  • 3
  • 19
  • Hi! This solution looks good, however, if my dataframe´s primary key is Column A, and I want to use the "my_output" df as a history dataframe, would the write_dataframe function still append the rows with duplicate column A values in to the "my_output" df? – HRDSL Jan 18 '23 at 14:29