6

I want to convert my query output to a python data frame to draw Line graph

import prestodb
import pandas as pd

conn=prestodb.dbapi.connect(
host='10.0.0.101',
port=8081,
user='hive',
catalog='hive',
schema='ong',
)

cur = conn.cursor()

query="SELECT dtime,tagName FROM machine where tagname is not null 
limit 1000"

cur.execute(query)

rows = cur.fetchall()

print(rows)

df = pd.DataFrame(query, columns=['x_axis','tagName'])

This is my sample output from query

[['2018-09-08 00:00:00.000', 26], ['2018-09-08 01:00:00.000', 26], 
['2018-09-08 02:00:00.000', 26], ['2018-09-08 03:00:00.000', 27], 
['2018-09-08 04:00:00.000', 27], ['2018-09-08 05:00:00.000', 27]]

how to convert this query output to a data frame using python

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
Naveen Vinayak
  • 61
  • 1
  • 1
  • 2

3 Answers3

6

It's very simple, I would suggest you to use pyhive.presto connector (see: https://github.com/dropbox/PyHive), to connect to presto, but also the one you use should work the same way.

Then you have a couple of options:

1 - Use presto connection and pandas read_sql_query

2 - Use presto cursor and use the output of fetchall as input data of the dataframe.

# option 1
import pandas as pd
from pyhive import presto

connection = presto.connect(user='my-user', host='presto.my.host.com', port=8889)

df = pd.read_sql_query("select 100", connection)

print(
    df.head()
)

or

# option 2
import pandas as pd
from pyhive import presto

connection = presto.connect(user='my-user', host='presto.my.host.com', port=8889)
cur = connection.cursor()

cur.execute("select 100") 

df = pd.DataFrame(cur.fetchall())

print(
    df.head()
)
Hammond95
  • 556
  • 7
  • 20
0

df = pd.DataFrame(cur.fetchall()) print(df)

kvr
  • 9
  • 3
0

To get data and column names into a dataframe, use this:

df = pd.DataFrame(cur.fetchall(), columns=[i[0] for i in cur.description])

scottlittle
  • 18,866
  • 8
  • 51
  • 70