You typically use reset after an entire episode. So that could be after you reached a terminal state in the mdp, or after you reached you maximum amount of time steps (set by you). I also typically reset it at the very start of training as well.
So if you are at your starting state 'A' and you want to reach state 'Z', you would run your time steps going from 'A' -> 'B' -> 'C' ..., then when you reach the terminal state 'Z', you start a new episode using reset, which would take you back to 'A'.
for episode in range(iterations):
state = env.reset() // first state
for time_step in range(1000): //max amount of iterations
action = take_action(state)
state, reward, done, _ = env.step(action)
if done:
break // takes you to the next episode where the environment is reset