0

Using PyMySQL python 3.6.3 versions, getting an DictCursor, and then fetchall(). I get all the data and .description says: (('recdate', 12, None, 19, 19, 0, False), ('outdoorhumidity', 246, None, 9, 9, 3, True), ('outdoortemperature', 246, None, 9, 9, 3, True)).

Printing the rows I get, f.ex:

2005-12-31 23:12:00    89.000   -6.667
2005-12-31 23:13:00    89.000   -6.667
2005-12-31 23:15:00    89.000   -6.650
2005-12-31 23:16:00    89.000   -6.650
2005-12-31 23:17:00    89.000   -6.640

Note the missing minute ...23:14:00 - but I do this for a bigger missing data imputations project. So this is sample data around the missing data. Via the dictionary I want to get the incomplete time series as well as f ex 3rd column, in the best way -simple easy readable code? Do I in each case have to know how many rows there are?

import pymysql

dbServerName = "127.0.0.1"
dbUser = "root"
dbPassword = "mypwd"
dbName = "dbname"
charSet = "utf8"
cursorType = pymysql.cursors.DictCursor

connectionObject = pymysql.connect(host=dbServerName, user=dbUser, password=dbPassword,
                                     db=dbName, charset=charSet,cursorclass=cursorType)

try:
    cursorObject = connectionObject.cursor()                                     

    sqlQuery = "SELECT recdate, outdoorhumidity, outdoortemperature FROM mytable WHERE recdate BETWEEN '2005-12-31 23:12:00' AND '2005-12-31 23:17:00';"


    cursorObject.execute(sqlQuery)

    #Fetch all the rows - within the cursor? Can this be done?
    rows = cursorObject.fetchall()

    print(cursorObject.description)

    for row in rows:
        print(row["recdate"], row["outdoorhumidity"], row["outdoortemperature"])   

except Exception as e:
    print("Exeception occured:{}".format(e))

finally:
    cursorObject.close()
    connectionObject.close()
OldBadger
  • 1
  • 4
  • Please try to add the some code from your project so that we have context. Also, much of this could use inline-formatting. See [how to ask](https://stackoverflow.com/help/how-to-ask). – Jacob Birkett Mar 01 '18 at 20:56
  • By "f ex 3rd column" do you mean the missing column that should be at the third place (index 2)? – Jacob Birkett Mar 01 '18 at 21:02
  • I want to treat "recdate" and "outdoortemperature" as a "pair" in ML so I get new values for imputations into table, later on – OldBadger Mar 01 '18 at 21:56
  • What do you mean by "pair"? Do you want dictionary of the two values, using `recdate` and `outdoortemperature` as keys? – Jacob Birkett Mar 01 '18 at 22:14
  • spikepaz - I really do not know - I want something, as simply as possible that can be combined where I put in a new "record" with only recdate -i.e. a datetime and all other fields are empty i.e. NaN, where ML can see the surrounding data, before and after the NaN(s) in column 3, and compute the data to then be imputed - inserted into the database. A missing record will have to be replaced by calculated values for 7 base data and then a series of calculated fields. I.e. temp hum press will then become dewpoint in the same row/record. – OldBadger Mar 01 '18 at 22:43
  • Try wrapping `cursorObject.description` in `dict`, so it becomes `print(dict(cursorObject.description))`. This will allow the first item of each tuple to be accessed by key. See [the docs](https://docs.python.org/3/tutorial/datastructures.html#dictionaries). Let me know if that helps with the data format that you're looking for. – Jacob Birkett Mar 01 '18 at 22:59
  • spikespaz - Thanks, now I understand that the fetchall() just gets a list of dicts. Investigating better overall performance. Considering getting everything data from the db to handling missing data with pandas (numpy handles only linear data - isn't it). Weather data could be either be missing data uphill or downhill or at the top of the hill or in the valley -or flavours... My head is already spinning with all possible loops from db to handling data from 'arrays'. I'd better separate the datetime data too, arrays are only homogenous in python - no mix of types!? – OldBadger Mar 02 '18 at 16:23
  • Getting there - taking a shortcut, direct from mysql/pymysql via pandas.read_sql_query into pandas own DataFrame. Can do concatenation of several Dataframes, according to sources it is so speedy... among sources https://gist.github.com/stefanthoss/364b2a99521d5bb76d51 Haven't tested yet, but looks very promising! Thanks for earlier inputs which made me understand the quirks with list of dicts. – OldBadger Mar 02 '18 at 17:58

1 Answers1

0

The things I wanted to achieve was much simpler to use with pandas, sqlalchemy, datetime and time. And also made commentes around differerent ways to go about referencing Pandas DataFrames.

Some things are of cause a bit special to just my case, but can be interesting for others, especially to play around with the use of .iloc and .loc, and type convertions handling time.

# Python version
# '3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 
       bit (AMD64)]'
# Running Spyder IDE version 3.2.6
# PANDAS VERSION '0.22.0'enter code here
import pandas as pd
import time
from datetime import datetime
from sqlalchemy import create_engine



def GetSqlData(begdatetime,enddatetime):
    temp_df = pd.read_sql_query("""SELECT recdate, outdoortemperature
                   FROM osterasen 
                   WHERE recdate BETWEEN 
                   %(c1)s
                   AND
                   %(c2)s """,
                   engine,
                   params={'c1': begdatetime, 'c2': enddatetime}
                   )
    temp_df['recdate'] = temp_df['recdate'].astype('datetime64[ns]')
    return temp_df

iso_datetimeformat = "%Y-%m-%d %H:%M:%S"

# BEGIN - CREATE MOCK MISSING DATES
# would have to come from the WeatherData.missings table
print("Step 0.0: ")
print("     Get missing data info from weatherdata.missings...")

num_of_missings=21
range_good_m = num_of_missings * 2
print(range_good_m)
range_good_s = range_good_m * 60
print(range_good_s)
a_complete_end_str = '2005-09-24 09:09:00'

b_complete_beg_str = '2005-09-24 09:31:00'

a_complete_end = time.mktime(datetime.strptime(a_complete_end_str, 
    iso_datetimeformat ).timetuple())
print("a_complete_end: ", a_complete_end)

b_complete_beg = time.mktime(datetime.strptime(b_complete_beg_str, i 
iso_datetimeformat ).timetuple())
print("b_complete_beg: ", b_complete_beg)

a_complete_beg = a_complete_end - range_good_s 
print("a_complete_beg: ", a_complete_beg)

b_complete_end = b_complete_beg + range_good_s
print("b_complete_end: ", b_complete_end)
print("b_complete_end: type: ", type(b_complete_end))

m_missing_beg = a_complete_end + 60
m_missing_end = b_complete_beg - 60
m_missing_beg_str = str(datetime.fromtimestamp(m_missing_beg))
m_missing_end_str = str(datetime.fromtimestamp(m_missing_end))

a_complete_beg_str = str(datetime.fromtimestamp(a_complete_beg ))
b_complete_end_str = str(datetime.fromtimestamp(b_complete_end ))

# Print out the ranges
print(a_complete_beg_str)
print(a_complete_end_str)
print(m_missing_beg_str)
print(m_missing_end_str)
print(b_complete_beg_str)
print(b_complete_end_str)

# END   - CREATE MOCK MISSING DATES

connection_str = 
    'mysql+pymysql://root:mypassword@127.0.0.1:3306/weatherdata'

engine = create_engine(connection_str,encoding='utf8')




print("Step 1.0: ")
df_a = GetSqlData( a_complete_beg_str, a_complete_end_str )
print(type(df_a),"   ", type(df_a.head))
print(df_a)
x1 = df_a.loc[5, 'outdoortemperature']
y1 = df_a.loc[5, 'recdate']
print("x1 = ", x1, type(x1))
print("y1 = ", y1, type(y1))



print("Step 2.0: ")
df_b = GetSqlData( b_complete_beg_str, b_complete_end_str )
print(type(df_b),"   ", type(df_b.head))
print(df_b)
x2 = df_b.loc[5, 'outdoortemperature']
y2 = df_b.loc[5, 'recdate']
print("x2 = ", x2, type(x2))
print("y2 = ", y2, type(y2))
OldBadger
  • 1
  • 4