In theory you can build neural networks using any loss function. You can used mean squared error or cross entropy loss functions. It boils down to what is going to be the most effective. By most effective, I mean: what is going to allow you to learn the parameters more quickly and / or more accurately.
In practicality most neural networks tend to use cross entropy. Many beginner's classes and tutorials on neural network will show you mean squared error, as it is probably more intuitive and easier to understand at first.
This article explains it in more detail, but let me quote:
When should we use the cross-entropy instead of the quadratic cost? In
fact, the cross-entropy is nearly always the better choice, provided
the output neurons are sigmoid neurons
Regarding the softmax function. As you probably know, every neuron will have an activation function. Very often that function is a sigmoid function. The softmax function is another type of activation functions, usually used in the last layer of your neural network. The softmax function has a unique property. The output will be a value from 0 to 1 and the sum of all the outputs for each neuron in the layer will equal to 1. Effectively indicating a probability. And this makes it very appropriate for multi-class classification, as it will give you the probability for each class, and you could pick the class with the highest probability.