How can I efficiently find the last time stamp (from the Datetime column) for each unique value in the SecurityID column? There are roughly 1000 unique values in the SecurityID column.
Currently I query the whole table for each unique value in SecurityID, then I look up the last time stamp. As you can probably imagine its excruciatingly slow. The table is over 40GBs and growing.
I do something like this:
os.chdir('E:\HDFStores')
store = pd.HDFStore('mysuperawesomehdfstore.h5')
assets = skBase.bbg_helper_assets('minutely')
df_timestamp = pd.Dataframe()
tags = ['T', 'B', 'A']
for asset in assets:
for tag in tags:
print asset, " ", tag
timestamp = (store.select('table', where = "SecurityID = ['" + asset + "'] & Tag = ['" + tag + "'] & columns = ['Datetime']")).tail(1)
if len(timestamp_.index) == 0:
print "value DNE"
else:
dt = (str(timestamp_iloc[0][0])).split(' ', 1)[0]
tm = (str(timestamp_iloc[0][0])).split(' ', 1)[1]
I have thought about running separate python processes in my 4 core machine. But I'd rather have a cleaner way of doing things than resorting to that hack.
Any ideas would be appreciated.