I am trying to run python program using spark-submit in local mode with the virtual environment for python and it is still running without failing even when pyspark is not installed in virtual env.
details are below for what I have tried for testing:
- Tried running by uninstalling pytest package to check whether it is using passed python executable and it seems working as expected
PS C:\Users\demouser\Desktop\pytest_demo> spark-submit --master local[*] --conf spark.pyspark.python="C:\Users\demouser\Desktop\pytest_demo\pyspark-env\Scripts\python.exe" .\src\main.py 2>error.txt
Traceback (most recent call last):
File "C:/Users/demouser/Desktop/pytest_demo/./src/main.py", line 1, in <module>
import pytest
ModuleNotFoundError: No module named 'pytest'
PS C:\Users\demouser\Desktop\pytest_demo> .\pyspark-env\Scripts\Activate.ps1
(pyspark-env) PS C:\Users\demouser\Desktop\pytest_demo> pip install pytest
Collecting pytest
Using cached pytest-7.1.2-py3-none-any.whl (297 kB)
Requirement already satisfied: tomli>=1.0.0 in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (2.0.1)
Requirement already satisfied: pluggy<2.0,>=0.12 in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (1.0.0)
Requirement already satisfied: colorama in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (0.4.5)
Requirement already satisfied: py>=1.8.2 in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (1.11.0)
Requirement already satisfied: atomicwrites>=1.0 in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (1.4.1)
Requirement already satisfied: attrs>=19.2.0 in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (22.1.0)
Requirement already satisfied: packaging in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (21.3)
Requirement already satisfied: iniconfig in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (1.1.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from packaging->pytest) (3.0.9)
Installing collected packages: pytest
Successfully installed pytest-7.1.2
(pyspark-env) PS C:\Users\demouser\Desktop\pytest_demo> deactivate
- After installing above package import Error goes away but there is no pyspark installed using pip till this step still it runs fine.
PS C:\Users\demouser\Desktop\pytest_demo> spark-submit --master local[*] --conf spark.pyspark.python="C:\Users\demouser\Desktop\pytest_demo\pyspark-env\Scripts\python.exe" .\src\main.py 2>error.txt
============================= test session starts =============================
platform win32 -- Python 3.8.12, pytest-7.1.2, pluggy-1.0.0
rootdir: C:\Users\demouser\Desktop\pytest_demo\test, configfile: pytest.ini
collected 6 items
test\unit\test_factory.py::TestSparkSession::test_sparksession PASSED [ 16%]
test\unit\test_factory.py::TestFactory::test_five_dfs_count[Test_df_count0] PASSED [ 33%]
test\unit\test_factory.py::TestFactory::test_five_dfs_count[Test_df_count1] PASSED [ 50%]
test\unit\test_factory.py::TestFactory::test_five_dfs_count[Test_df_count2] PASSED [ 66%]
test\unit\test_factory.py::TestFactory::test_five_dfs_count[Test_df_count3] PASSED [ 83%]
test\unit\test_factory.py::TestFactory::test_five_dfs_count[Test_df_count4] PASSED [100%]
============================== warnings summary ===============================
..\..\..\..\Spark\spark-3.0.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\sql\context.py:75
unit/test_factory.py::TestSparkSession::test_sparksession
C:\Spark\spark-3.0.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\sql\context.py:75: DeprecationWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================= 6 passed, 2 warnings in 26.90s ========================
INFO __main__:main.py: pytest session finished
- I have pyspark installed in my system with below version.
PS C:\Users\demouser\Desktop\pytest_demo> pyspark --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.1
/_/
Using Scala version 2.12.10, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_181
Branch HEAD
Compiled by user ubuntu on 2020-08-28T07:36:48Z
Revision 2b147c4cd50da32fe2b4167f97c8142102a0510d
Url https://gitbox.apache.org/repos/asf/spark.git
Type --help for more information.
- below are the packages present in pyspark-env virtual environment
PS C:\Users\demouser\Desktop\pytest_demo> .\pyspark-env\Scripts\Activate.ps1
(pyspark-env) PS C:\Users\demouser\Desktop\pytest_demo> pip freeze
atomicwrites==1.4.1
attrs==22.1.0
colorama==0.4.5
iniconfig==1.1.1
packaging==21.3
pluggy==1.0.0
py==1.11.0
pyparsing==3.0.9
pytest==7.1.2
tomli==2.0.1
(pyspark-env) PS C:\Users\demouser\Desktop\pytest_demo> python --version
Python 3.8.12
Please advise why it is not failing even in absence of pyspark package in virtual environment.