Hypothesis is an excellent possibility for your use case - if you use it right. First off, why it works: it's not random, but pseudorandom. When a test fails with a complex example, it will tone down the complexity until it finds the minimal test case that fails, and gives you that. Then it also stores a database of failing test cases, so replaying old failures is one of the first things it tries.
Now, the drawbacks are that building a testcase generally takes a long time, but the benefit is that you can be really sure of your code being robust.
I have no idea what your code looks like, but just to give you a mock-up:
from hypothesis import strategies as st
from hypothesis import assume
# In this example, damage and miss chance are based
# on the length of the name of the attack
samus = Character(health=100, attacks=['punch', 'shoot'])
wario = Character(health=70, attacks=['growl', 'punch'])
bowser = Character(health=250, attacks=['growl', 'kidnap_princess'])
st_character = st.sampled_from([samus, wario, bowser])
st_n_rounds = st.integer(min=0, max=10)
@st.composite
def fight_sequence(draw):
player_one = draw(st_character)
player_two = draw(st_character)
# don't test entire fights, just simulate one, record the steps,
# and check that the end state is what you expect
actions = [
dict(type='choose', player_number=1, player=player_one),
dict(type='choose', player_number=2, player=player_two)
]
# this filters out all test cases where players have the same character
assume(player_one != player_two)
n_rounds = draw(st_n_rounds)
both_alive = True
def _attack(player, other):
if not both_alive:
return
attack = draw(player.attacks)
response = draw(st.integers(min=0, max=len(attack)))
response_type = 'miss' if response == 0 else 'crit' if response == len(attack)) else 'hit'
actions.push(dict(type='attack', player=player, attack=attack, response=response_type))
if response_type == 'hit':
other.health -= len(attack)
elif response_type == 'crit':
other.health -= len(attack) * 2
if other.health <= 0:
actions.push(dict(type='ko', player=other))
for _ in range(n_rounds):
_attack(player_one, player_two)
_attack(player_two, player_one)
return actions
Then in your test case, feed the playback script to your code and check that the results line up. I hope you can use this for inspiration.