0

I'm new to Spark and working on pySpark version 3.0.1 and the Python version used by spark is 3.6.x. I have SQL files that have merge SQL's and are stored in google storage. I am trying to pass those SQL files to spark.sql, can some help me with how can I achieve this using SparkSession?

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("sample").getOrCreate()

df = spark.read.load("TERR.txt")

df.createTempView("example")

I want to pass this sql by reading it from a .sql file located in google storage bucket and .sql file will be having multiple lines:

df2 = spark.sql("SELECT * FROM example")
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Rak
  • 196
  • 2
  • 9
  • @MykolaZotko IMO it's not a dupe as it involves reading the SQL file from a storage object (GCS), and none of the answers in the linked question does provide a solution for this. – blackbishop Mar 29 '21 at 09:28

1 Answers1

0

In order to read a .sql file you could try the following:

from pyspark.sql import SparkSession
from google.cloud import storage #You will need pip install --upgrade google-cloud-storage

#Downloading the example.sql file

storage_client = storage.Client("[Your project name here]")
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob("folder_one/foldertwo/example.sql")
blob.download_to_filename(destination_file_name)

#Given example.sql document

with open("example.sql") as fr:
    Queries = fr.readlines()
        for line in Queries:
        query = spark.sql(line.strip())
    

davidmesalpz
  • 133
  • 6