0

I want to use GrafFrames package with Pyspark in my Foundry code repository.
As mentioned here: https://www.palantir.com/docs/foundry/transforms-python/environment-troubleshooting/#packages-which-require-both-a-conda-package-and-a-jar

I included graphframes package into list of conda libraries to be installed, but I also need to install server-side jar whent the spark session is initialized. So I go to transforms-python/build.gradle and I have the following code:

// DO NOT MODIFY THIS FILE
buildscript {
    repositories {
        maven {
            credentials {
                username ''
                password project.transformsBearerToken
            }
            authentication {
                basic(BasicAuthentication)
            }
            url project.transformsMavenProxyRepoUri
        }
    }

    dependencies {
        classpath "com.palantir.transforms:transforms-gradle-plugin:${transformsVersion}"
    }
}

apply plugin: 'com.palantir.transforms-defaults'

dependencies { 
    condaJars 'graphframes:graphframes:0.8.1-spark3.0-s_2.12' 
}

Then I save changes, I reload the page to apply changes, but then I get a code assist error:

FAILURE: Build failed with an exception.

* Where:

Build file '/scratch/standalone/1c8fbb49-de4d-4c21-8081-47c92748189a/code-assist/contents/build.gradle' line: 24

* What went wrong:

A problem occurred evaluating root project 'feature-generation'.

> Could not find method condaJars() for arguments [graphframes:graphframes:0.8.1-spark3.0-s_2.12] on object of type org.gradle.api.internal.artifacts.dsl.dependencies.DefaultDependencyHandler.

* Try:

Run with --info or --debug option to get more log output. Run with --scan to get full insights.

Does anyone have any idea why and how to fix this?

Grigory Sharkov
  • 121
  • 1
  • 8

1 Answers1

1

Almost there. The link you provided has the clue:

Select the option to Show hidden files and folders in the Settings cog, and select the inner transforms-python/build.gradle file. At the bottom of the file, add the following block:

  1. Check that the version of the jar is equal to what you've added through Conda/Pypi - conda has the most recent version of 0.7.32 whereas Pypi has 0.6.
  2. Move the dependencies entry to the inner build.gradle.

And I'm sure you're aware, the maven coordinates can be found here: https://mvnrepository.com/artifact/graphframes/graphframes

Kraeze
  • 26
  • 3
  • Thanks for feedback I am using pyspark 3+ version, the only version of jar that supports this version is 0.8+, whereas in the dependencies of the code repository I have only 0.7.32, which according to maven does not have a version for spark 3+ – Grigory Sharkov Mar 23 '23 at 13:27
  • and what do you mean by "move the dependencies entry to the inner build.gradle"? Could you elaborate, please? – Grigory Sharkov Mar 23 '23 at 13:29
  • 1
    Hi @grigory, I had the same issue with version not compatible with Spark 3.0. So I downloaded the source from [graphframes](https://github.com/graphframes/graphframes) , then imported the python code into my repo. As for build.gradle, there are two in a repo, the inner build.gradle is where the dependencies line should go (i.e. not the one your using but the other one). – Kraeze Mar 24 '23 at 06:18
  • Hi, @Kraeze, are you talking about this code? https://github.com/graphframes/graphframes/tree/master/python/graphframes – Grigory Sharkov Mar 27 '23 at 08:53
  • So to summarize. There are 2 problems at this stage: (1) location of graddle file. The outer graddle file does not know command condaJars, so it should be moved to the gradle file belonging located in src folder; (2) if you are using pyspark 3.+, then most probably conda forge does not contain the required version of the python library. In this case it should imported as code and used in your repository. thanks a lot @Kareze – Grigory Sharkov Mar 27 '23 at 13:58
  • 1
    Hi @GrigorySharkov, the code you've linked is exactly what you 'import' into your repo as it's laid out (make sure code matches version included in CondaJars command). Definitely not the way to go for prod work but if you want to get it working, that's the way I found. As for build.gradle, an instance should already exist in your sub-project, if not I would recommend creating a new repo and then trying out the outlined steps. – Kraeze Mar 28 '23 at 23:14