I have a code pipeline where I'm using Pyhive to insert data into DB.
from pyhive import hive
def save_postprocess_data(postprocess_data):
conn = hive.Connection(host="hostname", port=10000, username="username")
curr = conn.cursor()
insert = ""
for i in range(postprocess_data.shape[0]):
insert = insert + ",('%s','%d', '%d', '%s','%d','%s', '%.2f', '%s','%s','%d','%s')" % (postprocess_data.iloc[i,0],postprocess_data.iloc[i,1],postprocess_data.iloc[i,2],postprocess_data.iloc[i,3],postprocess_data.iloc[i,4],postprocess_data.iloc[i,5 ],postprocess_data.iloc[i,6],postprocess_data.iloc[i,7],postprocess_data.iloc[i,8], postprocess_data.iloc[i,9],postprocess_data.iloc[i,10])
insert_query = "insert into table table_name PARTITION (date) values"+ insert[1:]
curr.execute(insert_query)
conn.close()
return None
and, I get the entire query printed on the application log without even using logger
12/17/2018 07:59:21 AM USE `default`
12/17/2018 11:55:03 AM USE `default`
12/17/2018 11:55:03 AM insert into table table_name PARTITION (date) values("HUGE LIST OF VALUES")
I have the following config for logger
logging.basicConfig(filename=root_dir+'/application.log',format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p',level=logging.INFO)
problem is that i have around 30M records that needs to be inserted and the logger is flooded with the values from the query.
I would like to not log the entire query and only insert the following
logging.info("query successfully inserted %d values into the table",no_of_records)