Following is my code that currently uses two loops to process an input df over num iterations defined by outer loop and compare against a random sequence of numbers generated inside inner loop.
While the current approach gives me the output correctly, I suspect this could be done in a better way, particularly for cases where number of iterations in outer loop is more than a few million and num columns in df are close to a hundred.
I wanted to know if I might be missing a trick or two, that I can try to implement.
# Input df - index is same length as num iterations for inner loop defined below
# 'cumuluative' column value is used for comparison against random number inside inner loop
# 'units_A' is useful data captured from each iteration of inner loop that is aggregated after exiting inner loop
df_reference = pd.DataFrame(index=np.arange(1,11,1),data={'cumulative':np.arange(0.1,1.1,0.1),'units_A':np.arange(10,101,10)})
# Variable that determines num rows in output df
num_iterations_outer = 20
# Variable that determines number of iterations for inner loop operation
num_iterations_inner = 10
# Create an empty output df that will be updated at end
df_out = pd.DataFrame(columns=['cumulative','units_A'])
# Using np array for comparison inside loop instead of comparing against column which takes much longer
compare_against_arr = df_reference['cumulative'].values
# Create a list to store df's that will become rows of output df. This is done to store to list and concat once vs. concat each df at a time within loop
output_df_rows_list = []
for outer_iteration_num in np.arange(num_iterations_outer):
#current_cumulative_val = 1
# Rotation num is reset to 1 at the start of every outer interation
current_rotation_num = 1
# Create an empty list to store all rotation_num that are generated from inner loop iteration
rotations_list = []
for inner_iteration_num in np.arange(1,num_iterations_inner+1):
# Get a random number between (0.0,1.0]
comparator = np.random.random()
# Add the current rotation num to the list created before entering inner loop. Use the rotations list to get corresponding units_A after exiting inner loop
rotations_list.append(current_rotation_num)
# Compare random num 'comparator' to cumulative value corresponding to current rotation
if(comparator < compare_against_arr[current_rotation_num]):
# Reset rotation_num back to 1
current_rotation_num = 1
else:
# Increment rotation_num
current_rotation_num += 1
df_units_A_by_rotation = df_reference.reindex(rotations_list)
df_units_A_agg_outer_iter = pd.DataFrame(data=df_units_A_by_rotation.sum()).transpose()
output_df_rows_list.append(df_units_A_agg_outer_iter)
# Output df is created by concatenating all df stored in list that was updated in outer loop above
df_out = pd.concat(output_df_rows_list)
# Reset index so that it matches num_outer_iterations
df_out.index = np.arange(num_iterations_outer)
I appreciate your time, and thank you for taking a look!