Highest Voted 'foundry-python-transform' Questions

7

votes

2 answers

Why is my build hanging / taking a long time to generate my query plan with many unions?

I notice when I run the same code as my example over here but with a union or unionByName or unionAll instead of the join, my query planning takes significantly longer and can result in a driver OOM. Code included here for reference, with a slight…

asked Aug 16 '21 at 17:48

vanhooser

1,497
3
19

5

votes

1 answer

How do I parse xml documents in Palantir Foundry?

I have a set of .xml documents that I want to parse. I previously have tried to parse them using methods that take the file contents and dump them into a single cell, however I've noticed this doesn't work in practice since I'm seeing slower and…

pyspark palantir-foundry foundry-code-repositories foundry-python-transform

asked Dec 03 '21 at 20:55

vanhooser

1,497
3
19

5

votes

2 answers

How to create python libraries and how to import it in palantir foundry

In order to generalize the python functions, I wanted to add functions to python libraries so that I can use these function across the multiple repositories. Anyone please answer the below questions. 1) How to create our own python libraries 2) how…

pyspark conda palantir-foundry foundry-code-repositories foundry-python-transform

asked Oct 13 '20 at 09:30

Gavisha BN

141
1
8

3

votes

1 answer

Shuffle Stage Failing Due To Executor Loss

I get the following error when my spark jobs fails **"org.apache.spark.shuffle.FetchFailedException: The relative remote executor(Id: 21), which maintains the block data to fetch is dead."** Over view of my spark job input size is ~35 GB I have…

apache-spark palantir-foundry foundry-code-repositories foundry-python-transform

asked Jan 26 '22 at 12:38

Arun Mohan

349
4
13

3

votes

1 answer

How can I merge an incremental dataset and a snapshot dataset while retaining deleted rows?

I have a data connection source that creates two datasets: Dataset X (Snapshot) Dataset Y (Incremental) The two datasets pull from the same source. Dataset X consists of the current state of all rows in the source table. Dataset Y pulls all rows…

palantir-foundry foundry-code-repositories foundry-python-transform

asked Oct 15 '21 at 15:12

tomwhittaker

331
2
8

3

votes

1 answer

Palantir Foundry incremental testing is hard to iterate on, how do I find bugs faster?

I have a pipeline setup in my Foundry instance that is using incremental computation but for some reason isn't doing what I expect. Namely, I want to read the previous output of my transform and get the maximum value of a date, then read the input…

palantir-foundry foundry-code-repositories foundry-python-transform

asked Oct 11 '21 at 19:32

vanhooser

1,497
3
19

3

votes

1 answer

Is there a tool available within Foundry that can automatically populate column descriptions? If so, what is it called?

We are looking to see if there is a tool within the Foundry platform that will allow us to have a list of field descriptions and when the dataset builds, it can populated those descriptions automatically. Does this exist and if so what is the tool…

palantir-foundry foundry-code-repositories foundry-python-transform

asked Sep 25 '20 at 15:49

Robert F

187
5

2

votes

1 answer

PySpark Serialized Results too Large OOM for loop in Spark

I have serious difficulties in understanding why I cannot run a transform which, after waiting so many minutes (sometimes hours), returns the error "Serialized Results too large". In the transform I have a list of dates that I am iterating in a for…

pyspark out-of-memory palantir-foundry foundry-python-transform

asked Jan 22 '22 at 15:19

Jresearcher

297
3
13

2

votes

1 answer

Why is my Code Repo warning me about using withColumn in a for/while loop?

I'm noticing my code repo is warning me that using withColumn in a for/while loop is an antipattern. Why is this not recommended? Isn't this a normal use of the PySpark API?

pyspark palantir-foundry foundry-code-repositories foundry-python-transform

asked Dec 16 '21 at 15:46

vanhooser

1,497
3
19

2

votes

2 answers

How do I parse large compressed csv files in Foundry?

I have a large gziped csv file (.csv.gz) uploaded to a dataset that's about 14GB in size and 40GB when uncompressed. Is there a way to decompress, read, and write it out to a dataset using Python Transforms without causing the executor to OOM?

pyspark palantir-foundry foundry-python-transform

asked Aug 31 '21 at 11:14

vanhooser

1,497
3
19

1

vote

1 answer

Why is my Code Repo warning me not to use union and instead use unionByName?

I see in my repository it's warning me about using union and instead I should use unionByName. Aren't these the same thing? Why would I care which one to use?

palantir-foundry foundry-code-repositories foundry-python-transform

asked Jan 18 '22 at 14:08

vanhooser

1,497
3
19

1

vote

1 answer

Does a count() over a DataFrame materialize the data to the driver / increase a risk of OOM?

I want to run df.count() on my DataFrame, but I know my total dataset size is pretty large. Does this run the risk of materializing the data back to the driver / increasing my risk of driver OOM?

pyspark palantir-foundry foundry-code-repositories foundry-python-transform

asked Dec 13 '21 at 16:42

vanhooser

1,497
3
19

1

vote

1 answer

How do I add a column indicating the row number from a file on disk?

I want to parse a series of .csv files using spark.read.csv, but I want to include the row number of each line inside the file. I know that Spark typically doesn't order DataFrames unless explicitly told to do so, and I don't want to write my own…

pyspark palantir-foundry foundry-code-repositories foundry-python-transform

asked Nov 16 '21 at 22:33

vanhooser

1,497
3
19

1

vote

1 answer

How to throw a warning if threshold value exceeds in foundry code repositories

I have taken an input dataset and did some transformations on it, then wrote it into the output dataset. I have built this output dataset, and now I have to take the time taken to build the output dataset and compare that with a threshold time…

pyspark palantir-foundry foundry-code-repositories foundry-python-transform

asked Sep 20 '21 at 12:59

Monica Gaddipati

69
2

1

vote

1 answer

How do I compute a range of statuses from a daily indicator?

I have a df in the format of: | name | status | date | ____________________________ | ben | active | 01/01 | | ben | active | 01/02 | | ben | active | 01/03 | | ben | in-active | 01/04 | | ben | in-active | 01/05 | | ben | active …

pyspark palantir-foundry foundry-code-repositories foundry-python-transform

asked Aug 13 '21 at 22:06

vanhooser

1,497
3
19

Questions tagged [foundry-python-transform]