You should be familiar with training and inference.
In the training phase, you provide inputs and the desired outputs to the neural network. The exact way in which you encode the desired outputs can vary; one way is to define a reward function. The weights adjustment procedure is then defined to optimize the reward
In production, the network is used for inference. You now use it to predict the unknown outcomes, but you don't update the weights. Therefore, you don't have a reward function in this phase.
This makes neural networks a form of supervised learning. If you need unsupervised learning, you generally have a bigger problem, and might need different algorithms. One sort-of exception is when you can automatically evaluate the quality of your predictions in hindsight. An example of this is the branch predictor of CPU's; this can be trained using the actual data from branches taken.