Ensure that contents of list sums up to 1 for np.random.choice()

Question

The Context

In Python 3.5, I'm making a function to generate a map with different biomes - a 2-dimensional list with the first layer representing the lines of the Y-axis and the items representing items along the X-axis.

Example:

[
["A1", "B1", "C1"],
["A2", "B2", "C2"],
["A3", "B3", "C3"]
]

This displays as:

A1 B1 C1
A2 B2 C2
A3 B3 C3

The Goal

A given position on the map should be more likely to be a certain biome if its neighbours are also that biome. So, if a given square's neighbours are all Woods, that square is almost guaranteed to be a Woods.

My Code (so far)

All the biomes are represented by classes (woodsBiome, desertBiome, fieldBiome). They all inherit from baseBiome, which is used on its own to fill up a grid.

My code is in the form of a function. It takes the maximum X and Y coordinates as parameters. Here it is:

def generateMap(xMax, yMax):
    areaMap = []  # this will be the final result of a 2d list

    # first, fill the map with nothing to establish a blank grid
    xSampleData = []  # this will be cloned on the X axis for every Y-line
    for i in range(0, xMax):
        biomeInstance = baseBiome()
        xSampleData.append(biomeInstance)  # fill it with baseBiome for now, we will generate biomes later
    for i in range(0, yMax):
        areaMap.append(xSampleData)

    # now we generate biomes
    yCounter = yMax  # because of the way the larger program works. keeps track of the y-coordinate we're on
    for yi in areaMap:  # this increments for every Y-line
        xCounter = 0  # we use this to keep track of the x coordinate we're on
        for xi in yi:  # for every x position in the Y-line
            biomeList = [woodsBiome(), desertBiome(), fieldBiome()]
            biomeProbabilities = [0.0, 0.0, 0.0]
            # biggest bodge I have ever written
            if areaMap[yi-1][xi-1].isinstance(woodsBiome):
                biomeProbabilities[0] += 0.2
            if areaMap[yi+1][xi+1].isinstance(woodsBiome):
                biomeProbabilities[0] += 0.2
            if areaMap[yi-1][xi+1].isinstance(woodsBiome):
                biomeProbabilities[0] += 0.2
            if areaMap[yi+1][xi-1].isinstance(woodsBiome):
                biomeProbabilities[0] += 0.2
            if areaMap[yi-1][xi-1].isinstance(desertBiome):
                biomeProbabilities[1] += 0.2
            if areaMap[yi+1][xi+1].isinstance(desertBiome):
                biomeProbabilities[1] += 0.2
            if areaMap[yi-1][xi+1].isinstance(desertBiome):
                biomeProbabilities[1] += 0.2
            if areaMap[yi+1][xi-1].isinstance(desertBiome):
                biomeProbabilities[1] += 0.2
            if areaMap[yi-1][xi-1].isinstance(fieldBiome):
                biomeProbabilities[2] += 0.2
            if areaMap[yi+1][xi+1].isinstance(fieldBiome):
                biomeProbabilities[2] += 0.2
            if areaMap[yi-1][xi+1].isinstance(fieldBiome):
                biomeProbabilities[2] += 0.2
            if areaMap[yi+1][xi-1].isinstance(fieldBiome):
                biomeProbabilities[2] += 0.2
            choice = numpy.random.choice(biomeList, 4, p=biomeProbabilities)
            areaMap[yi][xi] = choice

    return areaMap

Explanation:

As you can see, I'm starting off with an empty list. I add baseBiome to it as a placeholder (up to xi == xMax and yi == 0) in order to generate a 2D grid that I can then cycle through.

I create a list biomeProbabilities with different indices representing different biomes. While cycling through the positions in the map, I check the neighbours of the chosen position and adjust a value in biomeProbabilities based on its biome.

Finally, I use numpy.random.choice() with biomeList and biomeProbabilities to make a choice from biomeList using the given probabilities for each item.

My Question

How can I make sure that the sum of every item in biomeProbabilities is equal to 1 (so that numpy.random.choice will allow a random probability choice)? There are two logical solutions I see:

a) Assign new probabilities so that the highest-ranking biome is given 0.8, then the second 0.4 and the third 0.2

b) Add or subtract equal amounts to each one until the sum == 1

Which option (if any) would be better, and how would I implement it?

Also, is there a better way to get the result without resorting to the endless if statements I've used here?

I think remram has given a good answer below. I'll only note that this problem in general seems to have something to do with solutions of the diffusion equation, and with 2-dimensional Markov chains. Maybe you can get some inspiration there. — Robert Dodier, Oct 16 '17 at 21:56

remram · Answer 1 · 2017-10-16T21:02:58.167

This sounds like a complex way to approach the problem. It will be difficult for you to make it work this way, because you are constraining yourself to a single forward pass.

One way you can do this is choose a random location to start a biome, and "expand" it to neighboring patches with some high probability (like 0.9).

(note that there is a code error in your example, line 10 -- you have to copy the inner list)

import random
import sys


W = 78
H = 40

BIOMES = [
    ('#', 0.5, 5),
    ('.', 0.5, 5),
]

area_map = []

# Make empty map
inner_list = []
for i in range(W):
    inner_list.append(' ')
for i in range(H):
    area_map.append(list(inner_list))

def neighbors(x, y):
    if x > 0:
        yield x - 1, y
    if y > 0:
        yield x, y - 1
    if y < H - 1:
        yield x, y + 1
    if x < W - 1:
        yield x + 1, y

for biome, proba, locations in BIOMES:
    for _ in range(locations):
        # Random starting location
        x = int(random.uniform(0, W))
        y = int(random.uniform(0, H))

        # Remember the locations to be handled next
        open_locations = [(x, y)]
        while open_locations:
            x, y = open_locations.pop(0)

            # Probability to stop
            if random.random() >= proba:
                continue

            # Propagate to neighbors, adding them to the list to be handled next
            for x, y in neighbors(x, y):
                if area_map[y][x] == biome:
                    continue
                area_map[y][x] = biome
                open_locations.append((x, y))

for y in range(H):
    for x in range(W):
        sys.stdout.write(area_map[y][x])
    sys.stdout.write('\n')

Of course a better method, the one usually used for those kinds of tasks (such as in Minecraft), is to use a Perlin noise function. If the value for a specific area is above some threshold, use the other biome. The advantages are:

Lazy generation: you don't need to generate the whole area map in advance, you determine what type of biome is in an area when you actually need to know that area
Looks much more realistic
Perlin gives you real values as output, so you can use it for more things, like terrain height, or to blend multiple biomes (or you can use it for "wetness", have 0-20% be desert, 20-60% be grass, 60-80% be swamp, 80-100% be water)
You can overlay multiple "sizes" of noise to give you details in each biome for instance, by simply multiplying them

Is there a way to have it generate a map entirely (no continents)? — , Oct 17 '17 at 04:39
Well here the ' ' (whitespace) biome is also a biome. The problem is that adding new biomes overwrite the default one, and the ones drawn just before, so reaching the ratio of biomes you want with this method is going to be difficult. — remram, Oct 17 '17 at 05:04
I guess you can keep going drawing the non-default biomes until you have the proportion of the default biome remaining that you want — remram, Oct 17 '17 at 05:04

score 0 · Accepted Answer · answered Oct 16 '17 at 20:58

0

I'd propose:

biomeProbabilities = biomeProbabilities / biomeProbabilities.sum()

For your endless if statements I'd propose to use a preallocated array of directions, like:

directions = [(-1, -1), (0, -1), (1, -1),
              (-1,  0),          (1,  0),
              (-1,  1), (0,  1), (1,  1)]

and use it to iterate, like:

for tile_x, tile_y in tiles:
    for x, y in direction:
        neighbor = map[tile_x + x][tile_y + y]

@remram did a nice answer about the algorithm you may or may not use to generate terrain, so I won't go to this subject.

answered Oct 16 '17 at 20:58

Julien Palard

8,736
2
37
44

I'm confused by `biomeProbabilities = biomeProbabilities / biomeProbabilities.sum()`. `biomeProbabilities` is a list - it can't be divided. – Oct 17 '17 at 04:35
Also, wouldn't it be `sum(biomeProbabilities)`? – Oct 17 '17 at 04:36
Oh sry I tested using numpy array, which can be divided, you may try them for this kind of work, they have a lot of usefull methods. – Julien Palard Oct 17 '17 at 12:39

Ensure that contents of list sums up to 1 for np.random.choice()

The Context

The Goal

My Code (so far)

My Question

2 Answers2