I'm looking at the FrozenLake environments in openai-gym. In both of them, there are no rewards, not even negative rewards, until the agent reaches the goal. Even if the agent falls through the ice, there is no negative reward -- although the episode ends. Without rewards, there is nothing to learn! Each episode starts from scratch with no benefit from previous episodes.
This should be a simple breadth-first search. It doesn't need RL. But assuming you use RL, one approach would be a reward of -1 for a step to a frozen square (that isn't the goal) and a reward of -10 for a step into a hole. The -1 would allow the agent to learn not to repeat squares. The -10 would allow the agent to learn to avoid the holes. So I'm tempted to create my own negative rewards on the agent side. This would make it more like the cliffwalker.
What am I missing? How would RL solve this (except via random search) with no rewards?