Questions tagged [foundry-code-repositories]

Questions related to development using Palantir Foundry's Code Repositories application are appropriate to use here.

Code Repositories is Foundry’s suite of applications for data engineers. Users can write data transformation code in the coding language of their choice and collaborate using distributed version control. Users may use the included web-based code editor to edit code in repositories or check code out to a local development environment. Our Transforms engine manages metadata to register new datasets with Foundry and can be used to ensure new contributions meet certain criteria before they are built by the distributed computation engine.

210 questions
9
votes
2 answers

What is the difference between transform & transform_df in Palantir Foundry?

Can someone explain why we need transform & transform_df methods separately?
sumitkanoje
  • 1,217
  • 14
  • 22
7
votes
2 answers

Why is my build hanging / taking a long time to generate my query plan with many unions?

I notice when I run the same code as my example over here but with a union or unionByName or unionAll instead of the join, my query planning takes significantly longer and can result in a driver OOM. Code included here for reference, with a slight…
6
votes
1 answer

Code Repository - What exactly is CTX in pyspark for a code repo?

I have seen the use of ctx in a code repo, what exactly is this? Is it a built in library? When would I use it? I've seen it in examples such as the following: df = ctx.spark.createdataframe(...
Robert F
  • 187
  • 5
5
votes
0 answers

How to prevent a sort on a groupby.applyInPandas using hash partitioning on the upstream dataset?

In my main transform, I'm running an algorithm by doing a groupby and then applyInPandas in Foundry. The build takes very long, and one idea is to organize the files to prevent shuffle reads and sorting, using Hash partitioning/bucketing. For a…
5
votes
1 answer

Best way to modify downstream references to a code workbook dataset to point to the new code repository dataset created using helper?

When using the "Export to Code Repository Helper" tool in an existing code workbook, what is the most efficient way to modify downstream dependencies to point to the newly created Code Repository dataset? We want to modify all downstream…
5
votes
1 answer

How do I parse xml documents in Palantir Foundry?

I have a set of .xml documents that I want to parse. I previously have tried to parse them using methods that take the file contents and dump them into a single cell, however I've noticed this doesn't work in practice since I'm seeing slower and…
5
votes
3 answers

How to union multiple dynamic inputs in Palantir Foundry?

I want to Union multiple datasets in Palantir Foundry, the name of the datasets are dynamic so I would not be able to give the dataset names in transform_df() statically. Is there a way I can dynamically take multiple inputs into transform_df and…
5
votes
1 answer

How can i iterate over json files in code repositories and incrementally append to a dataset

I have imported a dataset with 100,000 raw json files of about 100gb through data connection into foundry. I want to use the Python Transforms raw file access transformation to read the files, Flatten array of structs and structs into a dataframe as…
5
votes
2 answers

How to create python libraries and how to import it in palantir foundry

In order to generalize the python functions, I wanted to add functions to python libraries so that I can use these function across the multiple repositories. Anyone please answer the below questions. 1) How to create our own python libraries 2) how…
4
votes
1 answer

How can I process large files in Code Repositories?

I have a data feed that gives a large .txt file (50-75GB) every day. The file contains several different schemas within it, where each row corresponds to one schema. I would like to split this into partitioned datasets for each schema, how can I do…
4
votes
1 answer

Python unit tests for Foundry's transforms?

I would like to set up tests on my transforms into Foundry, passing test inputs and checking that the output is the expected one. Is it possible to call a transform with dummy datasets (.csv file in the repo) or should I create functions inside the…
4
votes
1 answer

Pyspark Getting the last date of the previous quarter based on Today's Date

In a code repo, using pyspark, I'm trying to use today's date and based on this I need to retrieve the last day of the prior quarter. This date would be then used to filter out data in a data frame. I was trying to create a dataframe in a code repo…
4
votes
3 answers

How can I hit a Foundry API from Code Repositories?

What is the correct way to hit an internal Foundry API from a Code Repository using, for example, a Python transform?
Adil B
  • 14,635
  • 11
  • 60
  • 78
3
votes
1 answer

You're trying to access a column, but multiple columns have that name

I am trying to join 2 dataframes such that both have the following named columns. What's the best way to do a LEFT OUTER join? df = df.join(df_forecast, ["D_ACCOUNTS_ID", "D_APPS_ID", "D_CONTENT_PAGE_ID"], 'left') Currently, I get an error…
x89
  • 2,798
  • 5
  • 46
  • 110
3
votes
1 answer

Read data that is already in output and write back to the output

I have requirement to read the data that is already in the output and join the data to input and write back the data to the same output. This build is scheduled every…
1
2 3
13 14