0

I am using a for loop to iterate through a pandas dataframe and append values to separate lists to then convert back to another pandas df. However, I am receiving an index out of range error when trying to call a value that has been previously appended to a list in the same iteration. NOTE: all lists (date, dow, rt, etc.) have been initialized prior to this code. Didn't want to make the post too long.

Here's my code:

#Iterrate through df to append to initialized lists
#list.append: O(log^2(n)) vs df.append: O(n^2)
#Append to lists then create df2 with completed lists
for i, row in df.iterrows():
        date.append(str(df.at[i, 'Route Date'])[0:10])
        dow.append(str(df.at[i, 'Route Number'])[0])
        rt.append(str(df.at[i, 'Route Number'])[1:4])
        #Append route type and municipality using lookup table
        #Lookup table will be packaged with PyInstaller
        for j, row in lookup.iterrows():
            if df.at[i, 'Route Number'] == lookup.at[j, 'Route']:
                rt_type.append(str(lookup.at[j, 'Route Type']))
                muni.append(str(lookup.at[j, 'Municipality']))
        miles.append(df.at[i, 'Miles'])
        disp_tons.append(df.at[i, 'Disposal Tons'])
        disp_loads.append(df.at[i, 'Disposal Loads'])
        stops.append(df.at[i, 'Stops'])
        clk_hrs.append(df.at[i, 'Clock Hours'])
        travel.append((miles[i]) / 22)
        #Service time varies by truck type and municipality
        if rt_type[i] == 'AFEL' and muni[i] == 'Vestavia Hills':
            service.append((stops[i]*17)/3600)
        disp.append((disp_loads[i])*(22/60))
        pre_post.append("0.78")
        target_clk_hrs.append(travel[i] + service[i] + disp[i] + 1.28)
        variance.append(target_clk_hrs[i] - clk_hrs[i])
        continue

The index error is occurring at target_clk_hrs.append(travel[i] + service[i] + disp[i] + 1.28) when calling the value of service[i]. When running this without the if statement: if rt_type[i] == 'AFEL' and muni[i] == 'Vestavia Hills': and instead using only service.append((stops[i]*17)/3600), I run into no indexing errors. I am confused as to why A. this works without the if statement, and B. why travel doesn't run into an index error if service does.

I'm assuming the issue lies with the iterator for j, row in lookup.iterrows(). Note that lookup is a separate lookup table used to return values to some of the lists conditionally. I've tried using a break statement on this loop and am still getting the same error.

My other thought was that rt_type[i] and muni[i] in the if block are not being indexed correctly and therefore service is not being appended, but I haven't been able to come up with a fix for this.

I've also consulted this post to no avail.

sweeney
  • 15
  • 4

1 Answers1

1

At i-th step of the iteration travel[i] is the last element of travel, because your code guarantees exactly i insertions into travel up to the point.

However, insertions into service are conditional:

if rt_type[i] == 'AFEL' and muni[i] == 'Vestavia Hills':
    service.append((stops[i]*17)/3600)

When the condition is not satisfied, append doesn't happen and the potential length of the list shortens by 1.

Right after the first time the append doesn't happen service has length i - 1, so service[i] causes index error.

P.S. there're no examples of input and output, so it's hard to say, but probably your task could be solved in more pandas way, avoiding all the lists.

Maria K
  • 1,491
  • 1
  • 3
  • 14
  • I've added elifs covering every permutation of `rt_type` and `muni`, as well as an else to append 'NaN' if no conditions are met, so append is guaranteed at each iteration. I'm now running into an index error at `rt_type[i] == 'AEFL' and muni[i] ...`. In the for loop to append `rt_type` and `muni`, I've added `else: continue` which should guarantee append, but obviously isn't. Any suggestions? Also, I'm dealing with lists due to time cost of growing an empty df using pandas. Appending the lists and zipping them to a df is much cheaper. – sweeney Jul 06 '23 at 19:28
  • Update: I've found a discrepancy between df and my lookup table. This was causing indexes to be thrown off when this value wasn't found in the lookup. Currently kicking myself for not checking sooner... Thanks for the help! – sweeney Jul 06 '23 at 20:06