1

I am trying to run python program using spark-submit in local mode with the virtual environment for python and it is still running without failing even when pyspark is not installed in virtual env.

details are below for what I have tried for testing:

  1. Tried running by uninstalling pytest package to check whether it is using passed python executable and it seems working as expected
    PS C:\Users\demouser\Desktop\pytest_demo> spark-submit --master local[*] --conf spark.pyspark.python="C:\Users\demouser\Desktop\pytest_demo\pyspark-env\Scripts\python.exe" .\src\main.py 2>error.txt
    Traceback (most recent call last):
      File "C:/Users/demouser/Desktop/pytest_demo/./src/main.py", line 1, in <module>
        import pytest
    ModuleNotFoundError: No module named 'pytest'
    PS C:\Users\demouser\Desktop\pytest_demo> .\pyspark-env\Scripts\Activate.ps1
    
    (pyspark-env) PS C:\Users\demouser\Desktop\pytest_demo> pip install pytest
    Collecting pytest
      Using cached pytest-7.1.2-py3-none-any.whl (297 kB)
    Requirement already satisfied: tomli>=1.0.0 in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (2.0.1)
    Requirement already satisfied: pluggy<2.0,>=0.12 in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (1.0.0)
    Requirement already satisfied: colorama in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (0.4.5)
    Requirement already satisfied: py>=1.8.2 in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (1.11.0)
    Requirement already satisfied: atomicwrites>=1.0 in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (1.4.1)
    Requirement already satisfied: attrs>=19.2.0 in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (22.1.0)
    Requirement already satisfied: packaging in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (21.3)
    Requirement already satisfied: iniconfig in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from pytest) (1.1.1)
    Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\users\demouser\desktop\pytest_demo\pyspark-env\lib\site-packages (from packaging->pytest) (3.0.9)
    Installing collected packages: pytest
    Successfully installed pytest-7.1.2
    (pyspark-env) PS C:\Users\demouser\Desktop\pytest_demo> deactivate
  1. After installing above package import Error goes away but there is no pyspark installed using pip till this step still it runs fine.
    PS C:\Users\demouser\Desktop\pytest_demo> spark-submit --master local[*] --conf spark.pyspark.python="C:\Users\demouser\Desktop\pytest_demo\pyspark-env\Scripts\python.exe" .\src\main.py 2>error.txt
    ============================= test session starts =============================
    platform win32 -- Python 3.8.12, pytest-7.1.2, pluggy-1.0.0
    rootdir: C:\Users\demouser\Desktop\pytest_demo\test, configfile: pytest.ini
    collected 6 items
    
    test\unit\test_factory.py::TestSparkSession::test_sparksession PASSED    [ 16%]
    test\unit\test_factory.py::TestFactory::test_five_dfs_count[Test_df_count0] PASSED [ 33%]
    test\unit\test_factory.py::TestFactory::test_five_dfs_count[Test_df_count1] PASSED [ 50%]
    test\unit\test_factory.py::TestFactory::test_five_dfs_count[Test_df_count2] PASSED [ 66%]
    test\unit\test_factory.py::TestFactory::test_five_dfs_count[Test_df_count3] PASSED [ 83%]
    test\unit\test_factory.py::TestFactory::test_five_dfs_count[Test_df_count4] PASSED [100%]
    
    ============================== warnings summary ===============================
    ..\..\..\..\Spark\spark-3.0.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\sql\context.py:75
    unit/test_factory.py::TestSparkSession::test_sparksession
      C:\Spark\spark-3.0.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\sql\context.py:75: DeprecationWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
    
    -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
    ======================= 6 passed, 2 warnings in 26.90s ========================
    INFO     __main__:main.py: pytest session finished
  1. I have pyspark installed in my system with below version.
    PS C:\Users\demouser\Desktop\pytest_demo> pyspark --version
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 3.0.1
          /_/
    
    Using Scala version 2.12.10, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_181
    Branch HEAD
    Compiled by user ubuntu on 2020-08-28T07:36:48Z
    Revision 2b147c4cd50da32fe2b4167f97c8142102a0510d
    Url https://gitbox.apache.org/repos/asf/spark.git
    Type --help for more information.
  1. below are the packages present in pyspark-env virtual environment
    PS C:\Users\demouser\Desktop\pytest_demo> .\pyspark-env\Scripts\Activate.ps1
    (pyspark-env) PS C:\Users\demouser\Desktop\pytest_demo> pip freeze
    atomicwrites==1.4.1
    attrs==22.1.0
    colorama==0.4.5
    iniconfig==1.1.1
    packaging==21.3
    pluggy==1.0.0
    py==1.11.0
    pyparsing==3.0.9
    pytest==7.1.2
    tomli==2.0.1
    (pyspark-env) PS C:\Users\demouser\Desktop\pytest_demo> python --version
    Python 3.8.12

Please advise why it is not failing even in absence of pyspark package in virtual environment.

0 Answers0