1

How to send a dataframe as an argument to a python script with spark-submit using subprocess. I have tried the below code but did not work out as we cant concatenate string and an object.

def spark_submit(self, test_cases, email):
    command = 'spark-submit TestRunner.py '+test_cases+" "+email
    print(command)
    process = subprocess.Popen([command], shell=True,
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    output, error = process.communicate()
   status = process.returncode
   print(status)```
rpanai
  • 12,515
  • 2
  • 42
  • 64

1 Answers1

3

You can't concatenate anything that isn't a string (or casted as one). I assume you can't pass directly a dataframe as a command line argument, so I suggest converting it to a file and passing the file path instead of the dataframe itself.

df.to_csv('mydf.csv')
command = 'spark-submit TestRunner.py mydf.csv ' + email 
GRoutar
  • 1,311
  • 1
  • 15
  • 38