I am currently going through LinkedIn Learning's Data Engineering Foundations course and I cannot run the files. One such file is:
`
##import required libraries
import pyspark
##create spark session
spark = pyspark.sql.SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config('spark.driver.extraClassPath', "/Users/harshittyagi/Downloads/postgresql-42.2.18.jar") \
.getOrCreate()
##read table from db using spark jdbc
movies_df = spark.read \
.format("jdbc") \
.option("url", "jdbc:postgresql://localhost:5432/etl_pipeline") \
.option("dbtable", "movies") \
.option("user", "<username>") \
.option("password", "<password>") \
.option("driver", "org.postgresql.Driver") \
.load()
##add code below
user_df = spark.read \
.format("jdbc") \
.option("url", "jdbc:postgresql://localhost:5432/etl_pipeline") \
.option("dbtable", "users") \
.option("user", "<username>") \
.option("password", "<password>") \
.option("driver", "org.postgresql.Driver") \
.load()
##print the users dataframe
print(user_df.show())
`
It keeps returning ModuleNotFoundError: No module named 'pyspark' even though I have tried installing it through pip and brew.