I'm trying to read a kinesis stream using spark / python in a jupyter notebook provided by AWS. I took the code from AWS documentation but when I tried to create a dataframe with kinesis I get a dependency error. I thought that all the dependencies were good because I created a notebook "SparkMagic PySpark". Here is my code:
import sys
from datetime import datetime
import boto3
import base64
from pyspark.sql import DataFrame, Row
from pyspark.context import SparkContext
from pyspark.sql.types import *
from pyspark.sql.functions import *
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue import DynamicFrame
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream
sc = SparkContext.getOrCreate();
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
#ssc = StreamingContext(sc, 1)
data_frame_DataSource0 = glueContext.create_data_frame.from_catalog(database = "***", table_name = "***", transformation_ctx = "DataSource0", additional_options = {"startingPosition":"latest","inferSchema":"false"})
print ("Start")
job.commit()
and Here is the error I get:
I went on the website with the spark libraries but I don't really now which one is missing and how to add it into a notebook.