3

How can i generate a random walk data between a start-end values while not passing over the maximum value and not going under the minimum value?

Here is my attempt to do this but for some reason sometimes the series goes over the max or under the min values. It seems that the Start and the End value are respected but not the minimum and the maximum value. How can this be fixed? Also i would like to give the standard deviation for the fluctuations but don't know how. I use a randomPerc for fluctuation but this is wrong as i would like to specify the std instead.

import numpy as np
import matplotlib.pyplot as plt

def generateRandomData(length,randomPerc, min,max,start, end):
    data_np = (np.random.random(length) - randomPerc).cumsum()
    data_np *= (max - min) / (data_np.max() - data_np.min())
    data_np += np.linspace(start - data_np[0], end - data_np[-1], len(data_np))
    return data_np

randomData=generateRandomData(length = 1000, randomPerc = 0.5, min = 50, max = 100, start = 66, end = 80)

## print values
print("Max Value",randomData.max())
print("Min Value",randomData.min())
print("Start Value",randomData[0])
print("End Value",randomData[-1])
print("Standard deviation",np.std(randomData))

## plot values
plt.figure()
plt.plot(range(randomData.shape[0]), randomData)
plt.show()
plt.close()

Here is a simple loop which checks for series that go under the minimum or over the maximum value. This is exactly what i am trying to avoid. The series should be distributed between the given limits for min and max values.

 ## generate 1000 series and check if there are any values over the maximum limit or under the minimum limit
    for i in range(1000):
        randomData = generateRandomData(length = 1000, randomPerc = 0.5, min = 50, max = 100, start = 66, end = 80)
        if(randomData.min() < 50):
            print(i, "Value Lower than Min limit")
        if(randomData.max() > 100):
            print(i, "Value Higher than Max limit")
RaduS
  • 2,465
  • 9
  • 44
  • 65
  • Wow... that function name is a mouthful. Have you seen PEP guidelines? You should keep them short and meaningful. – cs95 Oct 26 '17 at 12:38
  • hmmm, good point, will rename it :) – RaduS Oct 26 '17 at 12:40
  • Let me understand. In your code you want to generate a random walk of 1000 steps, starting with 66 and ending with 80. The step sizes are to be uniformly distributed on [50, 100]. What is `randomPerc`? – Bill Bell Oct 26 '17 at 13:32
  • Yes you are right. `randomPerc` is my attempt to define how much it fluctuates and it ranges between 0.25 and 0.5. But i would like to give the standard deviation instead or a percentage between 0 and 1 – RaduS Oct 26 '17 at 13:46
  • It seems to me that you can't have it all ways. If you add up 1,000 steps uniformly distributed over [50, 100] starting with 66 the probability of ending at 80 is zero. – Bill Bell Oct 26 '17 at 14:10
  • 1
    It sounds like you want a [Brownian bridge](https://en.wikipedia.org/wiki/Brownian_bridge), but with more constraints on the stochastic process. – Warren Weckesser Oct 26 '17 at 16:08
  • how would this Brownian bridge be applied here? – RaduS Oct 26 '17 at 16:13

4 Answers4

5

As you impose conditions on your walk, it can not be considered purely random. Anyway, one way is to generate the walk iteratively, and check the boundaries on each iteration. But if you wanted a vectorized solution, here it is:

def bounded_random_walk(length, lower_bound,  upper_bound, start, end, std):
    assert (lower_bound <= start and lower_bound <= end)
    assert (start <= upper_bound and end <= upper_bound)

    bounds = upper_bound - lower_bound

    rand = (std * (np.random.random(length) - 0.5)).cumsum()
    rand_trend = np.linspace(rand[0], rand[-1], length)
    rand_deltas = (rand - rand_trend)
    rand_deltas /= np.max([1, (rand_deltas.max()-rand_deltas.min())/bounds])

    trend_line = np.linspace(start, end, length)
    upper_bound_delta = upper_bound - trend_line
    lower_bound_delta = lower_bound - trend_line

    upper_slips_mask = (rand_deltas-upper_bound_delta) >= 0
    upper_deltas =  rand_deltas - upper_bound_delta
    rand_deltas[upper_slips_mask] = (upper_bound_delta - upper_deltas)[upper_slips_mask]

    lower_slips_mask = (lower_bound_delta-rand_deltas) >= 0
    lower_deltas =  lower_bound_delta - rand_deltas
    rand_deltas[lower_slips_mask] = (lower_bound_delta + lower_deltas)[lower_slips_mask]

    return trend_line + rand_deltas

randomData = bounded_random_walk(1000, lower_bound=50, upper_bound =100, start=50, end=100, std=10)

You can see it as a solution of geometric problem. The trend_line is connecting your start and end points, and have margins defined by lower_bound and upper_bound. rand is your random walk, rand_trend it's trend line and rand_deltas is it's deviation from the rand trend line. We collocate the trend lines, and want to make sure that deltas don't exceed margins. When rand_deltas exceeds the allowed margin, we "fold" the excess back to the bounds.

At the end you add the resulting random deltas to the start=>end trend line, thus receiving the desired bounded random walk.

The std parameter corresponds to the amount of variance of the random walk.

update : fixed assertions

In this version "std" is not promised to be the "interval".

igrinis
  • 12,398
  • 20
  • 45
  • That's it, this is exactly what i wanted, and it is a vectorized. Thank you @igrinis your solution is awesome :) – RaduS Oct 30 '17 at 06:22
  • One last thing, it throws an error when using the same lower and upper levels as the start and the end. There are situations when the min and max limits are the same as start and end and the series most not overflow at all. How would you approach them? Is there any modifications that can be done? – RaduS Oct 30 '17 at 22:15
  • It is possible, but then you loose control over variance of the pseudo random walk. See updated code. – igrinis Oct 31 '17 at 16:00
  • This is perfect, thank you for the update of the code, this is an awesome answer :) – RaduS Nov 02 '17 at 14:53
2

I noticed you used built in functions as arguments (min and max) which is not reccomended (I changed these to max_1 and min_1). Other than this your code should work as expected:

def generateRandomData(length,randomPerc, min_1,max_1,start, end):
    data_np = (np.random.random(length) - randomPerc).cumsum()
    data_np *= (max_1 - min_1) / (data_np.max() - data_np.min())
    data_np += np.linspace(start - data_np[0], end - data_np[-1],len(data_np))
    return data_np
randomData=generateRandomData(1000, 0.5, 50, 100, 66, 80)

If you are willing to modify your code this will work:

import random
for_fill=[]
# generate 1000 samples within the specified range and save them in for_fill
for x in range(1000):
    generate_rnd_df=random.uniform(50,100)
    for_fill.append(generate_rnd_df)
#set starting and end point manually
for_fill[0]=60
for_fill[999]=80
xan
  • 119
  • 4
  • Hei Xan, good point but if you run the generator over many iterations, you will notice that it still creates values over the max or under the min values. This is exactly what i am trying to avoid – RaduS Oct 26 '17 at 13:41
  • `for i in range(1000): randomData = generateRandomData(1000000, 0.5, 50, 100, 66, 80) if(randomData.min() < 50): print("Value Lower than limit") elif(randomData.max() > 100): print("Value Bigger than limit")` – RaduS Oct 26 '17 at 13:47
  • I see that you edited the answer but i still do not see how this creates a random walk series which meets the given requirements – RaduS Oct 26 '17 at 16:22
1

Here is one way, very crudely expressed in code.

>>> import random
>>> steps = 1000
>>> start = 66
>>> end = 80
>>> step_size = (50,100)

Generate 1,000 steps assured to be within the required range.

>>> crude_walk_steps = [random.uniform(*step_size) for _ in range(steps)]
>>> import numpy as np

Turn these steps into a walk but notice that they fail to meet the requirements.

>>> crude_walk = np.cumsum(crude_walk_steps)
>>> min(crude_walk)
57.099056617839288
>>> max(crude_walk)
75048.948693623403

Calculate a simple linear transformation to scale the steps.

>>> from sympy import *
>>> var('a b')
(a, b)
>>> solve([57.099056617839288*a+b-66,75048.948693623403*a+b-80])
{b: 65.9893403510312, a: 0.000186686954219243}

Scales the steps.

>>> walk = [0.000186686954219243*_+65.9893403510312 for _ in crude_walk]

Verify that the walk now starts and stops where intended.

>>> min(walk)
65.999999999999986
>>> max(walk)
79.999999999999986
Bill Bell
  • 21,021
  • 5
  • 43
  • 58
  • Hei Bill, I implemented the solution but i do not get the result i am after, i just get a straight line – RaduS Oct 26 '17 at 16:21
  • There are no y co-ordinates. – Bill Bell Oct 26 '17 at 17:04
  • 1
    You misunderstood the question. @RaduS wants to start the walk in certain value `start`, end it in value `end` and make it stay between the boundaries `min` and `max`. – igrinis Oct 30 '17 at 06:07
  • @igrinis: And I would say, if enough people offer guesses one will get an answer that matches his imagination. – Bill Bell Oct 30 '17 at 14:26
1

You can also generate a stream of random walks and filter out those that do not meet your constraints. Just be aware that by filtering they are not really 'random' anymore.

The code below creates an infinite stream of 'valid' random walks. Be careful with very tight constraints, the 'next' call might take a while ;).

import itertools
import numpy as np


def make_random_walk(first, last, min_val, max_val, size):
    # Generate a sequence of random steps of lenght `size-2`
    # that will be taken bewteen the start and stop values.
    steps = np.random.normal(size=size-2)

    # The walk is the cumsum of those steps
    walk = steps.cumsum()

    # Performing the walk from the start value gives you your series.
    series = walk + first

    # Compare the target min and max values with the observed ones.
    target_min_max = np.array([min_val, max_val])
    observed_min_max = np.array([series.min(), series.max()])

    # Calculate the absolute 'overshoot' for min and max values
    f = np.array([-1, 1])
    overshoot = (observed_min_max*f - target_min_max*f)

    # Calculate the scale factor to constrain the walk within the
    # target min/max values.
    # Don't upscale.
    correction_base = [walk.min(), walk.max()][np.argmax(overshoot)]
    scale = min(1, (correction_base - overshoot.max()) / correction_base)

    # Generate the scaled series
    new_steps = steps * scale
    new_walk = new_steps.cumsum()
    new_series = new_walk + first

    # Check the size of the final step necessary to reach the target endpoint.
    last_step_size = abs(last - new_series[-1]) # step needed to reach desired end

    # Is it larger than the largest previously observed step?
    if last_step_size > np.abs(new_steps).max():
        # If so, consider this series invalid.
        return None
    else:
        # Else, we found a valid series that meets the constraints.
        return np.concatenate((np.array([first]), new_series, np.array([last])))


start = 66
stop = 80
max_val = 100
min_val = 50
size = 1000

# Create an infinite stream of candidate series
candidate_walks = (
    (i, make_random_walk(first=start, last=stop, min_val=min_val, max_val=max_val, size=size))
    for i in itertools.count()
)
# Filter out the invalid ones.
valid_walks = ((i, w) for i, w in candidate_walks if w is not None)

idx, walk = next(valid_walks)  # Get the next valid series
print(
    "Walk #{}: min/max({:.2f}/{:.2f})"
    .format(idx, walk.min(), walk.max())
)
NSteinhoff
  • 301
  • 1
  • 3