I want to connect to cassandra using pyspark from google colabs. I have written the follwing code downloading the spark file and setting it to the path variable with java. The following is the code:
!wget https://downloads.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
!tar -xvzf spark-3.1.2-bin-hadoop3.2.tgz
!pip install findspark
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.1.2-bin-hadoop3.2"
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars com.datastax.spark:spark-cassandra-connector_2.12:3.1.0.jar pyspark-shell'
os.environ['SPARK_SUBMIT'] = '--packages com.datastax.spark:spark-cassandra-connector2.12:3.1.0 pyspark-shell'
os.environ['SPARK_HOME']="/content/spark-3.1.2-bin-hadoop3.2"
conf = SparkConf()
conf.setAppName("Spark Cassandra")
conf.set("spark.cassandra.connection.host","host").set("spark.cassandra.auth.username","username").set("spark.cassandra.auth.password","password")
sc = SparkContext(conf=conf)
sql = SQLContext(sc)
dataFrame = sql.read.format("org.apache.spark.sql.cassandra").options(table="table", keyspace="database").load()
dataFrame.printSchema()
And when i execute this it creates the context session but shows a error of "org.apache.spark.sql.cassandra" this. I guess i have to download the connector seperately and include in my path or i have included in a worng way for my path. If any solution please help. This is in google colabs