I am working on replacing the pandas API with Koalas API in my project.
I am trying to read a parquet file from a location but getting the below error.
import databricks.koalas as ks
kdf = ks.read_parquet(path, columns=column_names)
kdf = pd.read_parquet(path, columns = column_names)
File "C:\Users\eapasnr\Anaconda3\envs\oden1\lib\site-packages\databricks\koalas\namespace.py",
line 773, in read_parquet
kdf = read_spark_io(path = path, format = "parquet", options = options, index_col = index_col)
File "C:\Users\eapasnr\Anaconda3\envs\oden1\lib\site-packages\databricks\koalas\namespace.py",
line 676, in read_spark_io
sdf = default_session().read.load(path = path, format = format, schema = schema, ** options)
File "C:\Users\eapasnr\Anaconda3\envs\oden1\lib\site-packages\databricks\koalas\utils.py",
line 456, in default_session
session = builder.getOrCreate()
File "C:\Users\eapasnr\Anaconda3\envs\oden1\lib\site-packages\pyspark\sql\session.py", line
228, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:\Users\eapasnr\Anaconda3\envs\oden1\lib\site-packages\pyspark\context.py", line 392,
in getOrCreate
SparkContext(conf = conf or SparkConf())
File "C:\Users\eapasnr\Anaconda3\envs\oden1\lib\site-packages\pyspark\context.py", line 144,
in __init__
SparkContext._ensure_initialized(self, gateway = gateway, conf = conf)
File "C:\Users\eapasnr\Anaconda3\envs\oden1\lib\site-packages\pyspark\context.py", line 339,
in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "C:\Users\eapasnr\Anaconda3\envs\oden1\lib\site-packages\pyspark\java_gateway.py", line
101, in launch_gateway
proc = Popen(command, ** popen_kwargs)
File "C:\Users\eapasnr\Anaconda3\envs\oden1\lib\subprocess.py", line 800, in __init__
restore_signals, start_new_session)
File "C:\Users\eapasnr\Anaconda3\envs\oden1\lib\subprocess.py", line 1207, in _execute_child
startupinfo)
File "c:\Users\eapasnr\.vscode\extensions\ms-python.python-
2022.2.1924087327\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydev_bundle\
pydev_monkey.py",
line 860, in new_CreateProcess
return getattr(_subprocess, original_name)(app_name, cmd_line, * args)
FileNotFoundError: [WinError 2] The system cannot find the file specified
If I replace the koalas with pandas, I am able to read the parquet file.
I am running the code in Visual Studio Code and not on Databricks Platform. databricks --version gives 0.16.4