pyspark - change the version of python from 2.6.6 to 3.6

Question

Whenever I start pyspark it starts in Python 2.6.6. How can I change pyspark to use Python 3.6? Python 3.6 is already installed. I use cloudera quickstart vm 5.8. Spark version 1.6.0

I can start 3.6 by typing "python3.6". I can also start python 2.6.6 by typing "python". I read that Centos uses python 2.6.6 and so I cannot upgrade 2.6.6 as it might break Centos. I have already changed the system path variable but that did not start the spark context. I get sc or Spark context is not defined.

Also, tried the following step:

[cloudera@quickstart ~]$ export PYSPARK_PYTHON=python3.6

[cloudera@quickstart ~]$ pyspark
Python 3.6.3 (default, Dec 31 2017, 07:08:36) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/shell.py", line 30, in <module>
    import pyspark
  File "/usr/lib/spark/python/pyspark/__init__.py", line 41, in <module>
    from pyspark.context import SparkContext
  File "/usr/lib/spark/python/pyspark/context.py", line 33, in <module>
    from pyspark.java_gateway import launch_gateway
  File "/usr/lib/spark/python/pyspark/java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
  File "/usr/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 18, in <module>
  File "/usr/local/lib/python3.6/pydoc.py", line 59, in <module>
    import inspect
  File "/usr/local/lib/python3.6/inspect.py", line 361, in <module>
    Attribute = namedtuple('Attribute', 'name kind defining_class object')
  File "/usr/lib/spark/python/pyspark/serializers.py", line 381, in namedtuple
    cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
>>>

Possible duplicate of [Apache Spark: How to use pyspark with Python 3](https://stackoverflow.com/questions/30279783/apache-spark-how-to-use-pyspark-with-python-3). Could you try changing the environment variable `export PYSPARK_PYTHON=python3` — Denny Lee, Dec 31 '17 at 21:48
I did that. still the same issue. edited my question for more details. — Ravi, Jan 01 '18 at 17:10
Can you please try to do this (Change your python installation path. I have just given mine).export PYSPARK_PYTHON=/home/cloudera/anaconda3/bin/python export PYSPARK_DRIVER_PYTHON=/home/cloudera/anaconda3/bin/python — user3858193, Jan 26 '18 at 21:33
Possible duplicate - https://stackoverflow.com/questions/42349980/unable-to-run-pyspark - looks like python 3.6 is not supported by Spark 1.6 — Burrito, Apr 11 '19 at 14:39

pyspark - change the version of python from 2.6.6 to 3.6

0 Answers0