1

I would like to simulate individual changes in growth and mortality for a variable number of days. My dataframe is formatted as follows...

    import pandas as pd

    data = {'unique_id':  ['2', '4', '5', '13'],
            'length': ['27.7', '30.2', '25.4', '29.1'],
            'no_fish': ['3195', '1894', '8', '2774'],
            'days_left': ['253', '253', '254', '256'],
            'growth': ['0.3898', '0.3414', '0.4080', '0.3839']
           }

    df = pd.DataFrame(data)

    print(df)

      unique_id length no_fish days_left  growth
    0         2   27.7    3195       253  0.3898
    1         4   30.2    1894       253  0.3414
    2         5   25.4       8       254  0.4080
    3        13   29.1    2774       256  0.3839
    

Ideally, I would like the initial length (i.e., length) to increase by the daily growth rate (i.e., growth) for each of the days remaining in the year (i.e., days_left).

    df['final'] = df['length'] + (df['days_left'] * df['growth']

However, I would also like to update the number of fish that each individual represents (i.e., no_fish) on a daily basis using a size-specific equation. I'm fairly new to python so I initially thought to use a for-loop (I'm not sure if there is another, more efficient way). My code is as follows:

# keep track of run time - START
start_time = time.perf_counter()

df['z'] = 0.0
for indx in range(len(df)): 
    count = 1
    while count <= int(df.days_to_forecast[indx]):
   
        # (1) update individual length
        df.lgth[indx] = df.lgth[indx] + df.linearGR[indx]
    
        # (2) estimate daily size-specific mortality 
        if df.lgth[indx] > 50.0:
            df.z[indx] = 0.01
        else:
            if df.lgth[indx] <= 50.0:
                df.z[indx] = 0.052857-((0.03/35)*df.lgth[indx])
            elif df.lgth[indx] < 15.0:
                df.z[indx] = 0.728*math.exp(-0.1892*df.lgth[indx])
    
        df['no_fish'].round(decimals = 0)
        if df.no_fish[indx] < 1.0:
            df.no_fish[indx] = 0.0
        elif df.no_fish[indx] >= 1.0:
            df.no_fish[indx] = df.no_fish[indx]*math.exp(-(df.z[indx]))
    
        # (3) reduce no. of days left in forecast by 1
        count = count + 1

# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time)) 

The above code now works correctly, but it is still far to inefficient to run for 40,000 individuals each for 200+ days.

I would really appreciate any advice on how to modify the following code to make it pythonic.

Thanks

  • 2
    what is wrong with what you are doing compared to what you expect? – Andrew Ryan Jun 06 '22 at 19:48
  • On another note, if you want your code to look more Pythonic, I would advise you to define variables for the many constants that appear throughout your equations. It makes it easier to understand and debug. – econbernardo Jun 07 '22 at 14:09
  • By the way, there are Python libraries specifically created for agent-based modeling. – econbernardo Jun 07 '22 at 14:13
  • Yes, the for loop is inefficient. The more efficient way would be to use NumPy broadcasting, which is supported by Pandas. For instance, instead of `# (1) update individual length df.lgth[indx] = df.lgth[indx] + df.linearGR[indx]` you could just do `df.lgth = df.lgth + df.linearGR`, etc. – econbernardo Jun 07 '22 at 14:29
  • 1
    @econbernardo do you have any recommendations for a strong Python library? I took a quick glance at agentpy, which looks a lot like a python version of NetLogo. I am interested in seeing what others often use. – STG_fisheries Jun 08 '22 at 15:26
  • @STG_fisheries Unfortunately, I have no experience with agent-based libraries. The community for scientific libraries in Python is very good, though, so I would be surprised if there aren't strong libraries. A quick Google search indicates [Mesa](https://mesa.readthedocs.io/en/latest/) as a possible alternative. – econbernardo Jun 08 '22 at 15:33

2 Answers2

1

Another option that was suggested to me is to use the pd.dataframe.apply function. This dramatically reduced the overall the run time and could be useful to someone else in the future.

### === RUN SIMULATION === ###
start_time = time.perf_counter()          # keep track of run time -- START

#-------------------------------------------------------------------------#
def function_to_apply( df ):

    df['z_instantMort'] = ''           
 
    for indx in range(int(df['days_left'])):
    
        # (1) update individual length
        df['length'] = df['length'] + df['growth']
      
        # (2) estimate daily size-specific mortality 
        if df['length'] > 50.0:
            df['z_instantMort'] = 0.01
        else:
            if df['length'] <= 50.0:
                df['z_instantMort'] = 0.052857-((0.03/35)*df['length'])
            elif df['length'] < 15.0:
                df['z_instantMort'] = 0.728*np.exp(-0.1892*df['length'])
  
        whole_fish = round(df['no_fish'], 0)
        if whole_fish < 1.0:
            df['no_fish'] = 0.0
        elif whole_fish >= 1.0:
            df['no_fish'] = df['no_fish']*np.exp(-(df['z_instantMort']))
          
return df
#-------------------------------------------------------------------------#

sim_results = df.apply(function_to_apply, axis=1)

total_elapsed_time = round(time.perf_counter() - start_time, 2)       # END
print("Forecast iteration completed in {} seconds".format(total_elapsed_time)) 
print(sim_results)
### ====================== ###

output being...

Forecast iteration completed in 0.05 seconds
   unique_id    length     no_fish  days_left  growth  z_instantMort
0        2.0  126.3194  148.729190      253.0  0.3898           0.01
1        4.0  116.5742   93.018465      253.0  0.3414           0.01
2        5.0  129.0320    0.000000      254.0  0.4080           0.01
3       13.0  127.3784  132.864757      256.0  0.3839           0.01
0

As I said in my comment, a preferable alternative to for loops in this setting is using vector operations. For instance, running your code:

import pandas as pd
import time
import math
import numpy as np

data = {'unique_id':  [2, 4, 5, 13],
        'length': [27.7, 30.2, 25.4, 29.1],
        'no_fish': [3195, 1894, 8, 2774],
        'days_left': [253, 253, 254, 256],
        'growth': [0.3898, 0.3414, 0.4080, 0.3839]
       }

df = pd.DataFrame(data)

print(df)

# keep track of run time - START
start_time = time.perf_counter()

df['z'] = 0.0
for indx in range(len(df)): 
    count = 1
    while count <= int(df.days_left[indx]):
   
        # (1) update individual length
        df.length[indx] = df.length[indx] + df.growth[indx]
    
        # (2) estimate daily size-specific mortality 
        if df.length[indx] > 50.0:
            df.z[indx] = 0.01
        else:
            if df.length[indx] <= 50.0:
                df.z[indx] = 0.052857-((0.03/35)*df.length[indx])
            elif df.length[indx] < 15.0:
                df.z[indx] = 0.728*math.exp(-0.1892*df.length[indx])
    
        df['no_fish'].round(decimals = 0)
        if df.no_fish[indx] < 1.0:
            df.no_fish[indx] = 0.0
        elif df.no_fish[indx] >= 1.0:
            df.no_fish[indx] = df.no_fish[indx]*math.exp(-(df.z[indx]))
    
        # (3) reduce no. of days left in forecast by 1
        count = count + 1

# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
print(df)

with output:

   unique_id  length  no_fish  days_left  growth
0          2    27.7     3195        253  0.3898
1          4    30.2     1894        253  0.3414
2          5    25.4        8        254  0.4080
3         13    29.1     2774        256  0.3839
Forecast iteration completed in 31.75 seconds
   unique_id    length     no_fish  days_left  growth     z
0          2  126.3194  148.729190        253  0.3898  0.01
1          4  116.5742   93.018465        253  0.3414  0.01
2          5  129.0320    0.000000        254  0.4080  0.01
3         13  127.3784  132.864757        256  0.3839  0.01

Now with vector operations, you could do something like:

# keep track of run time - START
start_time = time.perf_counter()
df['z'] = 0.0
for day in range(1, df.days_left.max() + 1):
    update = day <= df['days_left']
    # (1) update individual length
    df[update]['length'] = df[update]['length'] + df[update]['growth']

    # (2) estimate daily size-specific mortality
    df[update]['z'] = np.where( df[update]['length'] > 50.0, 0.01, 0.052857-( ( 0.03 / 35)*df[update]['length'] ) )
    df[update]['z'] = np.where( df[update]['length'] < 15.0, 0.728 * np.exp(-0.1892*df[update]['length'] ), df[update]['z'] )
                        
                    
                            
    df[update]['no_fish'].round(decimals = 0)
    df[update]['no_fish'] = np.where(df[update]['no_fish'] < 1.0, 0.0, df[update]['no_fish'] * np.exp(-(df[update]['z'])))                          
# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
print(df)    

with output

Forecast iteration completed in 1.32 seconds
   unique_id    length     no_fish  days_left  growth    z
0          2  126.3194  148.729190        253  0.3898  0.0
1          4  116.5742   93.018465        253  0.3414  0.0
2          5  129.0320    0.000000        254  0.4080  0.0
3         13  127.3784  132.864757        256  0.3839  0.0
econbernardo
  • 98
  • 1
  • 7