The database design is far from optimum but I have to deal with it and now I'm really stuck.
Edit: I'm using cx_Oracle
OK so this is my query:
query="select degree, spectraldev.event.eventnumber \
from spectraldev.degree \
join spectraldev.alignment on \
(spectraldev.version_id = alignment.version_id) \
join spectraldev.event on \
(alignment.timestamp between event.eventstart and event.eventstop) \
join spectraldev.eventsetup on \
(spectraldev.event.eventsetup = spectraldev.eventsetup.oid) \
where spectraldev.event.eventnumber>=" + options.start + " AND spectraldev.event.eventnumber<=" + options.stop + " AND \
HITS>=" + str(options.minimum_hits)+" \
order by spectraldev.event.eventnumber"
db_cursor.execute(query)
Which returns a bunch of degree
s (12.34 etc.) for many events, which are identified by a unique number (eventnumber
like 346554).
So I get a table like this:
454544 45.2
454544 12.56
454544 41.1
454544 45.4
454600 22.3
454600 24.13
454600 21.32
454600 22.53
454600 54.51
454600 33.87
454610 32.7
454610 12.99
And so on…
Now I need to create a dictionary with the average degree for each event (summing up all corresponding floats and dividing by the number of them).
I think this could be done in SQL but I just can't get it work. At the moment I'm using python to do this, but the fetch command takes 1-2 hours to complete about 2000 Events, which is far too slow, since I need to process about 1000000 events.
This is my fetching part, which takes so long:
_degrees = []
for degree, eventNumber in cursor.fetchall():
_degrees.append([eventNumber, degree])
and then sorting (which is really fast, < 1sec) and calculating averages (also really fast):
_d={}
for eventNumber, degree in _degrees:
_d.setdefault(eventNumber, []).append(degree)
for event in events:
_curDegree = _degrees[int(event)]
_meanDegree = sum(_curDegree) / float(len(_curDegree))
meanDegrees.append(_meanDegree)
Is there a way to do the python part in SQL?