Questions tagged [foundry-code-workbooks]

Questions related to development using Palantir Foundry's Code Workbooks application are appropriate to use here.

Code Workbooks is Foundry’s advanced analytics and data science suite. Users can write code and visualize their analytical pipelines’ resulting graphs. Each computational step can be saved as a Foundry dataset and made available to other applications. Users may also create and share templates to reuse logic across workbooks.

54 questions
5
votes
1 answer

Best way to modify downstream references to a code workbook dataset to point to the new code repository dataset created using helper?

When using the "Export to Code Repository Helper" tool in an existing code workbook, what is the most efficient way to modify downstream dependencies to point to the newly created Code Repository dataset? We want to modify all downstream…
5
votes
1 answer

How can i iterate over json files in code repositories and incrementally append to a dataset

I have imported a dataset with 100,000 raw json files of about 100gb through data connection into foundry. I want to use the Python Transforms raw file access transformation to read the files, Flatten array of structs and structs into a dataframe as…
4
votes
1 answer

Pyspark Getting the last date of the previous quarter based on Today's Date

In a code repo, using pyspark, I'm trying to use today's date and based on this I need to retrieve the last day of the prior quarter. This date would be then used to filter out data in a data frame. I was trying to create a dataframe in a code repo…
3
votes
1 answer

How do I know my Foundry Job is using AQE?

I hear people mention this AQE feature sometimes and I'm wondering how to verify if my job is using it or not. I'm running transformations both in Code Repositories and Code Workbooks.
3
votes
1 answer

How do I union two datasets in Palantir Foundry within a code workbook?

I need to UNION two datasets in a Code Workbook of Palantir Foundry and I'm not sure how to do that. I want to use Pyspark to do this. I'm new to Foundry, please help!
3
votes
1 answer

Import pre-trained Deep Learning Models into Foundry Codeworkbooks

How do you import a h5 model locally from Foundry into code workbook? I want to use the hugging face library as shown below, and in its documentation the from_pretrained method expects a URL path to the where the pretrained model lives. I would…
Mihir
  • 982
  • 1
  • 7
  • 13
2
votes
1 answer

In Foundry, how can I Hive partition with only 1 parquet file per value?

I'm looking to improve the performance on running filtering logic. To accomplish this, the idea is to do hive partitioning setting by setting the partition column to a column in the dataset (called splittable_column). I checked and the cardinality…
2
votes
1 answer

How do I access a file that I uploaded to a folder from a Transform?

I uploaded an image file into a folder in Foundry, and I want to use it as an input to a Transform. It looks like it's stored as some kind of resource in a service called Blobster, how can I access this file and use it?
2
votes
1 answer

is it possible to generate pdf from datasets and save to foundry incrementally

FPDF is a library that allows to convert a pandas dataframe to nicely formatted pdf reports. Is there a feature in foundry code repo or code workbook to write pdf files into foundry from a spark or pandas dataframe ? i have a requirement to create a…
2
votes
1 answer

Is it possible to revert a Code Workbook to a previous version?

I'd like to recover an accidentally modified workbook to a previous version.
amy.bananagrams
  • 100
  • 1
  • 6
2
votes
1 answer

How do I access an old transaction of a dataset in Code Workbook?

In Contour you can access old transactions by clicking on the "version" button at the top. How do I do this in Code Workbook?
Andrew St P
  • 524
  • 1
  • 5
  • 13
1
vote
1 answer

Palantir foundry code workbook, export individual xmls from dataset

I have a dataset which have an xml column and i am trying to export individual xmls as files with filename being in another column using codeworkbook I filtered the rows i want using below code def prepare_input(xml_with_debug): from…
asb
  • 781
  • 2
  • 5
  • 23
1
vote
1 answer

In Palantir Foundry, how do I debug pyspark (or pandas) UDFs since I can't use print statements?

In Code Workbooks, I can use print statements, which appear in the Output section of the Code Workbook (where errors normally appear). This does not work for UDFs, and it also doesn't work in Code Authoring/Repositories. What are ways I can debug…
1
vote
1 answer

When does Spark do a "Scan ExistingRDD"?

I have a job that takes in a huge dataset and joins it with another dataset. The first time it ran, it took a really long time and Spark executed a FileScan parquet when reading the dataset, but in future jobs the query plan shows Scan ExistingRDD…
1
vote
1 answer

How do I make my Spark job run faster using executors?

I know my code is free from antipatterns since I don't have any warnings in my Authoring code editor, so I know my code is doing PySpark operations that are distributed and scalable. My current job has 2 executors assigned to it with 2 cores each,…
1
2 3 4