0

I have to execute the following query against Hive from Python:

SELECT * FROM user WHERE age > ${hiveconf:AGE} 

As for now I have the following working code snippet:

import pyhs2
with pyhs2.connect(host='localhost',
                   port=60850,
                   authMechanism="PLAIN",
                   user='hduser',
                   database='default') as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT * FRPM user WHERE age > ?", 10)

So I can pass parameters to a query with PyHs2. But how can I perform variable substitution from Python code in order to not change the original query (i.e. replace ${hiveconf:AGE} with some value in a clean manner)?

Aliaxander
  • 2,547
  • 4
  • 20
  • 45

2 Answers2

2

something like this?:

def get_sql(substitution="${hiveconf:AGE}"):
    sql = "select * from bla where blub > {variable}"
    sql = sql.format(variable=substitution)
    return sql

result:

get_sql()
"select * from bla where blub > ${hiveconf:AGE}"

get_sql("test")
"select * from bla where blub > test"

for more details on the format syntax look here: https://docs.python.org/2/library/string.html#format-string-syntax

Serbitar
  • 2,134
  • 19
  • 25
  • But how about this '$' sign? – Aliaxander Jan 25 '16 at 15:01
  • you can add anything you want. I added $. – Serbitar Jan 25 '16 at 15:03
  • I got the idea, but I there is a problem with the colon between 'hiveconf' and paramname: ${hiveconf:paramname}. Seems that python doesn't allow to use colon in variable name... – Aliaxander Jan 25 '16 at 15:08
  • well, do you need to call it hiveconf:AGE in the first place? I thought you want to replace it with something else anyway, so you can use another variable name and replace it with "hiveconf:AGE" if needed. I updated the original. – Serbitar Jan 25 '16 at 15:27
  • Ok, seems that I should change the variable name to one that doesn't contain colon – Aliaxander Jan 28 '16 at 10:17
  • Actually I don't need to paste ${hiveconf:AGE} into the query, it's already here. I need to replace it with some parameter. – Aliaxander Jan 28 '16 at 12:40
  • well if you really need to replace this just use sgl.replace("${hiveconf:AGE} ", "somethingelse") – Serbitar Jan 28 '16 at 17:08
1

you can use subprocess in python. You can store the sql in a seprate file and execute it using the below format. You can add more variables as well

 import subprocess
 value1=your_value
 p=subprocess.Popen("hive -f /sql/file/location/script.hql"+" --hiveconf variable1="+value1,shell=True,
                 stdout=subprocess.PIPE,
                 stderr=subprocess.PIPE)
 out, err = p.communicate()

 if err==None:
    print "successfull"
else:
    print "not successfull"

Or if you want to execute it the pyhs2 way below is the format for your execute statement.

 cur.execute("SELECT * FROM user WHERE age > %d"% 10)