0

I have been trying to optimize the two following nested loops:

def startbars(query_name, commodity_name):

     global h_list
     nc, s, h_list = [], {}, {}
     query = """ SELECT wbcode, Year, """+query_name+""" 
                 FROM innovotable WHERE commodity='"""+commodity_name+"""' and

                 """+query_name+""" != 'NULL' """
     rows = cursor.execute(query)
     for row in rows:
         n = float(row[2])
         s[str(row[0])+str(row[1])] = n
         nc.append(n)
     for iso in result:
         try:
             for an_year in xrange(1961, 2031, 1):
                 skey = iso+str(an_year)
                 h_list[skey] = 8.0 / max(nc) * s[skey]
         except:
             pass

Any ideas? Thanks.

SilentGhost
  • 307,395
  • 66
  • 306
  • 293
relima
  • 3,462
  • 5
  • 34
  • 53
  • 2
    You might want to use bound parameters to fix the sql injection issue, and I'm suspicious about the != 'NULL' instead of IS NOT NULL – Marco Mariani Oct 01 '10 at 16:29
  • This is my actual code. The sql query takes a long time, but the loops also take forever. I really don't know what can be done on it. – relima Oct 01 '10 at 16:29
  • 2
    so where does "result" come from? – Marco Mariani Oct 01 '10 at 16:30
  • what sort of error are you excepting for? – SilentGhost Oct 01 '10 at 16:32
  • it is basically a list that is generated before, – relima Oct 01 '10 at 16:38
  • result = re.findall(r"...(?=-)", str(vr_ctrs.getNodeNames())) – relima Oct 01 '10 at 16:38
  • 1
    Your variable names are terrible... You should really consider more meaningful ones. Also you have yourself wide open to SQL injection attacks. You *need* to learn about parameterization before you get your data compromised -- if it isn't already. – Daenyth Oct 01 '10 at 16:41
  • and an explicit 'except KeyError' if that's what you want to catch. – Marco Mariani Oct 01 '10 at 16:43
  • the queries are going to run on a compiled pyexe file, so I don't really care about injection attacks on a local sqlite db file. I agree that I need to work on better names for my variables though. – relima Oct 01 '10 at 16:55

1 Answers1

5

Your code isn't complete which makes it hard to give good advice but:

  1. Inner loop doesn't depend on outer-loop, so pull it out of the outer loop.
  2. max(nc) is a constant after first loop, so pull it out of the loops.

Also you need to know how slow the current code is, and how fast you need it to be, otherwise your optimisations maybe misplaced.

Your datastructures are all messed up. Maybe something list this would be faster:

def startbars(query_name, commodity_name):

    assert query_name in INNOVOTABLE_FIELD_NAMES

    ## TODO: Replace with proper SQL query
    query = """ SELECT wbcode, Year, """+query_name+""" 
             FROM innovotable WHERE commodity='"""+commodity_name+"""' and

             """+query_name+""" != 'NULL' """
    rows = cursor.execute(query)

    mapYearToWbcodeToField = {}
    nc = []
    global h_list
    h_list = {}

    for row in rows:
        n = float(row[2])
        wbCodeToField = mapYearToWbcodeToField.setdefault(int(row[1]),{})
        wbCodeToField[str(row[0])] = n
        nc.append(n)

    constant = 8.0 / max(nc)


    for (an_year,wbCodeToField) in mapYearToWbcodeToField.iteritems():
        if an_year < 1961 or an_year > 2031:
            continue

        for (wbCode,value) in wbCodeToField.iteritems():
            if wbCode not in result:
                continue

            skey = wbCode+str(an_year)
            h_list[skey] = constant * value

Or moving all checks into the first loop:

def startbars(query_name, commodity_name):

    assert query_name in INNOVOTABLE_FIELD_NAMES

    ## TODO: Replace with proper SQL query
    query = """ SELECT wbcode, Year, """+query_name+""" 
             FROM innovotable WHERE commodity='"""+commodity_name+"""' and

             """+query_name+""" != 'NULL' """
    rows = cursor.execute(query)

    data = []
    maxField = None

    for row in rows:
        an_year = int(row[1])
        if an_year < 1961 or an_year > 2031:
            continue

        wbCode = str(row[0])
        if wbCode not in result:
            continue

        n = float(row[2])

        data.append((wbCode+str(an_year),n))
        if maxField is None or n > maxField:
            maxField = n

    constant = 8.0 / maxField

    global h_list
    h_list = {}

    for (skey,n) in data:
        h_list[skey] = constant * n
Douglas Leeder
  • 52,368
  • 9
  • 94
  • 137
  • Oh! Omg!! It worked out of the box (I had to remove the assert line) but it made the program react instantaneously! Thank you very much. It reduce the time to 1/8 of the original time!!! Wow! – relima Oct 01 '10 at 17:09
  • 1
    Don't remove the assert... Instead define INNOVOTABLE_FIELD_NAMES to be a list of the fields of the innovotable table. – Douglas Leeder Oct 01 '10 at 17:14