0

I am currently coding pyspark pipelines using databricks connect library. The steps I followed are given here. This library has been installed in a virtual environment.

When I try to execute this code

spark.read.load(path).first()

I get this error

<class 'TypeError'>, 'JavaPackage' object is not callable, <traceback object at 0x0000017AB70ECF88>
Traceback (most recent call last):
  File "D:/Friendsurance/Repository/data-ingestion/job/main.py", line 83, in <module>
    run()
  File "D:/Friendsurance/Repository/data-ingestion/job/main.py", line 79, in run
    el_job.run()
  File "D:\Friendsurance\Repository\data-ingestion\job\task\__init__.py", line 18, in run
    data: DataFrame = self.extract()
  File "D:\Friendsurance\Repository\data-ingestion\job\task\ELTask.py", line 14, in extract
    return self.extractor.extract()
  File "D:\Friendsurance\Repository\data-ingestion\job\task\extractor\BucketExtractor.py", line 26, in extract
    self.spark, self.load_storage.get_path(), self.conf.partition_column
  File "D:\Friendsurance\Repository\data-ingestion\job\task\extractor\__init__.py", line 14, in calculate_last_day_run
    spark.read.load(path).first().show()
  File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 1381, in first
    return self.head()
  File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 1369, in head
    rs = self.head(1)
  File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 1371, in head
    return self.take(n)
  File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 657, in take
    return self.limit(num).collect()
  File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 596, in collect
    if self._sc._conf.get(self._sc._jvm.PythonSecurityUtils.USE_FILE_BASED_COLLECT()):
TypeError: 'JavaPackage' object is not callable

But when I am out of the virtual environment where I am using the pyspark library provided here, I am able to execute the same line and it gives me the output.

Can anyone please tell me where I am going wrong with this?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Trijit
  • 501
  • 1
  • 4
  • 18

0 Answers0