0

I am currently going through LinkedIn Learning's Data Engineering Foundations course and I cannot run the files. One such file is:

`
##import required libraries
import pyspark


##create spark session
spark = pyspark.sql.SparkSession \
   .builder \
   .appName("Python Spark SQL basic example") \
   .config('spark.driver.extraClassPath', "/Users/harshittyagi/Downloads/postgresql-42.2.18.jar") \
   .getOrCreate()


##read table from db using spark jdbc
movies_df = spark.read \
   .format("jdbc") \
   .option("url", "jdbc:postgresql://localhost:5432/etl_pipeline") \
   .option("dbtable", "movies") \
   .option("user", "<username>") \
   .option("password", "<password>") \
   .option("driver", "org.postgresql.Driver") \
   .load()
   
##add code below
user_df = spark.read \
   .format("jdbc") \
   .option("url", "jdbc:postgresql://localhost:5432/etl_pipeline") \
   .option("dbtable", "users") \
   .option("user", "<username>") \
   .option("password", "<password>") \
   .option("driver", "org.postgresql.Driver") \
   .load()

##print the users dataframe
print(user_df.show())

`

It keeps returning ModuleNotFoundError: No module named 'pyspark' even though I have tried installing it through pip and brew.

Festo
  • 1
  • 1

0 Answers0