Inconsistent replication of numpy random generator using random.RandomState(None) or 0

Question

I have had this problem before. At the time, I imported random a number of times. This time I import numpy a single time among all modules.

EDITED Using None might have been the problem. But still not working with 0. Minimal example working, so it's deeper in the code and not about importing anything (the example imports the same modules)...

EDITED2 I'm now guessing it has to do with sort in a list?

Not even: PYTHONHASHSEED=0 python main.py from Disable hash randomization from within python program worked

EDITED3: By the way, this is not a simple problem as the straight answer may suggest. It's actually a hard problem.

I use the code:

import numpy as np


class Model:
    def __init__(self, seed=0):
        self.np = np
        self.seed = self.np.random.RandomState(seed)

Note the max_distance is exactly the same. But the coefficient numbers are different.

And here (in another module, as I pass self) is how I calculate the gini coefficient:

def calculate_gini(incomes, model):
    # Sort smallest to largest
    cumm = model.np.sort(incomes)
    # Values cannot be 0
    cumm += .00001
    # Find cumulative totals
    n = cumm.shape[0]
    index = model.np.arange(1, n + 1)
    gini = ((model.np.sum((2 * index - n - 1) * cumm)) / (n * model.np.sum(cumm)))
    return gini

So. My question is: what are other possible sources of stochasticity that I'm not seing? And how can I control them?

All over the code I use random as such:

    def initialize_person(self):
        n = self.params['N_PEOPLE']
        ages = self.seed.randint(self.params['MIN_AGE'], self.params['MAX_AGE'], size=n)
        females = self.seed.choice([True, False], size=n)
        industries = self.seed.choice(list(self.industries.values()), p=[i.size for i in self.industries.values()],
                                      size=n)
        for i in range(n):
            skilled = self.seed.choice([True, False], p=[industries[i].p_skill, (1 - industries[i].p_skill)])
            income = self.seed.lognormal(industries[i].income_mean, industries[i].income_variance)
            person = Person(_id=str(i), age=ages[i], female=females[i], industry=industries[i],
                            skill=skilled, income=income)
            self.persons.append(person)

Details: This was produced in Linux Mint latest version 20.1 Ulyssa Python 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0] :: Anaconda, Inc. on linux

where did you use the Model class? I only see lower case 'model' — Bing Wang, Apr 12 '21 at 19:18
Just trying to make sure there was a single reference to numpy... @qu — B Furtado, Apr 12 '21 at 19:18

B Furtado · Answer 1 · 2021-04-13T20:27:16.977

0

Seed must not be None.

Correct is:

def main(params, seed=0):
    my_model = Model(params, seed)
    my_model.run()
    my_model.logger.info('All done...')

Incorrect is:

def main(params, seed=None):
    my_model = Model(params, seed)
    my_model.run()
    my_model.logger.info('All done...')

Even thought the model was:

class Model:
    def __init__(self, params, seed=0):

edited Apr 13 '21 at 20:27

answered Apr 13 '21 at 20:10

B Furtado

1,488
3
20
34

score -1 · Answer 2 · answered Apr 12 '21 at 19:48

-1

You don't need to store the numpy module as an attribute, just import it in any module that needs it; There are no "multiple instances" of the numpy module.

As for the randomness, did you try putting np.random.seed(seed) at the beginning of your script? If you also use randomness from the python random module, then also do random.seed(seed). With that, you shouldn't need the self.seed attribute of Model, just call the regular np.random.choice etc.

answered Apr 12 '21 at 19:48

maarten

412
5
7

Exactly. I do not use the regular random model. import numpy is not the first thing to be imported. I import ``` import os import sys from joblib import Parallel, delayed import pandas as pd import geopandas as gpd``` and – B Furtado Apr 12 '21 at 19:50
And then import datetime import json import logging import os from math import ceil, – B Furtado Apr 12 '21 at 19:51
np.random.seed(seed) did not work. That's exactly the point. – B Furtado Apr 12 '21 at 19:54
1

Ah, when you use parallel processing (joblib) it's a very different discussion. Then the source of uncontrolled randomness is most likely the non-deterministic order in which the code is executed. – maarten Apr 12 '21 at 19:58
Actually I only use it when a run the sensitivity analysis. For the example, although I import it, I do not use it. – B Furtado Apr 12 '21 at 19:59
It's hard to say anything meaningful without a minimal working example. – maarten Apr 12 '21 at 20:04
It may be that some external package you use makes use of python's `random` module. You should probably seed that module even if you do not use it directly in your own code. – maarten Apr 12 '21 at 20:12
Thanks. @maarten I tried to replicate it. With None, the example did not replicate, but it did with 0. In my code, still did not work with 0. So, now I'm guessing it is not about anything I import... – B Furtado Apr 12 '21 at 21:18

Inconsistent replication of numpy random generator using random.RandomState(None) or 0

2 Answers2