0

Till last week both kedro and kedro[spark.SparkDataSet] pip libraries were installed on the cluster. But since last 3-4 days they wont be installed together on the cluster. It shows that its a duplicate library but my code also fails as sparkdataset is not found by it. If I install only kedro I get the error as shown in the below screenshot error

Msant
  • 1
  • 1

2 Answers2

0

To install kedro follow this installation prerequisites

Install Kedro

To install Kedro from the Python Package Index (PyPI) simply run:

pip install kedro

Sample code -

from pyspark.sql import SparkSession
from pyspark.sql.types import (StructField, StringType,
                               IntegerType, StructType)

from kedro.extras.datasets.spark import SparkDataSet

schema = StructType([StructField("name", StringType(), True),
                     StructField("age", IntegerType(), True)])

data = [('Alex', 31), ('Bob', 12), ('Clarke', 65), ('Dave', 29)]

spark_df = SparkSession.builder.getOrCreate().createDataFrame(data, schema)

data_set = SparkDataSet(filepath="test_data")
data_set.save(spark_df)
reloaded = data_set.load()

reloaded.take(4)

enter image description here

enter image description here

Abhishek K
  • 3,047
  • 1
  • 6
  • 19
  • Yes thanks after doing this im getting this error - DataSetError: No module named 'fsspec.asyn'. Failed to instantiate DataSet . Please note i have added fsspec to the cluster and pip installed it on the notebook – Msant May 26 '22 at 02:44
0

You don't need to install both pip install kedro["spark.SparkDataSet"]==0.16.3 is a superset of pip install kedro==0.16.3

datajoely
  • 1,466
  • 10
  • 13
  • I get the error as shown in the edited post. This was the reason why I had both in the first place. – Msant May 26 '22 at 02:45
  • Looking at your response to the other answer here I think you may have a conflicting library installed on the Databricks cluster? Is there anything installed that would conflict with the version of `fsspec` required by this version of kedro? – datajoely May 27 '22 at 06:20