0

I have pandas data frame(sample program), converted koalas dataframe, now I am to execute on spark cluster(windows standalone), when i try from command prompt as

spark-submit --master local hello.py, getting error ModuleNotFoundError: No module named 'databricks'

import pandas as pd
from databricks import koalas as ks

workbook_loc = "c:\\2020\Book1.xlsx"
df = pd.read_excel(workbook_loc, sheet_name='Sheet1')
kdf = ks.from_pandas(df)
print(kdf)

What should I change so that I can make use of spark cluster features. My actual program written in pandas does many things, I want to make use of spark cluster to see performance improvements.

Kumar Prvn
  • 31
  • 1
  • 7

1 Answers1

0

You should install koalas via the cluster's admin UI (Libraries/PyPI), if you run pip install koalas on the cluster, it won't work.

siprob
  • 1