How to run pandas-Koalas progam suing spark-submit(windows)?

Question

I have pandas data frame(sample program), converted koalas dataframe, now I am to execute on spark cluster(windows standalone), when i try from command prompt as

spark-submit --master local hello.py, getting error ModuleNotFoundError: No module named 'databricks'

import pandas as pd
from databricks import koalas as ks

workbook_loc = "c:\\2020\Book1.xlsx"
df = pd.read_excel(workbook_loc, sheet_name='Sheet1')
kdf = ks.from_pandas(df)
print(kdf)

What should I change so that I can make use of spark cluster features. My actual program written in pandas does many things, I want to make use of spark cluster to see performance improvements.

score 0 · Answer 1 · answered Aug 04 '20 at 15:10

0

You should install koalas via the cluster's admin UI (Libraries/PyPI), if you run pip install koalas on the cluster, it won't work.

answered Aug 04 '20 at 15:10

siprob

1

How to run pandas-Koalas progam suing spark-submit(windows)?

1 Answers1