In a 1 dimensional CNN, the kernel moves in 1 direction. Input and output data of a 1 dimensional CNN is 2 dimensional. It is mostly used on Time-Series data since you can just move left or right (x).
In a 2 dimensional CNN, the kernel moves in 2 directions. Input and output data of a 2 dimensional CNN is 3 dimensional. As you have mentioned it is widely used for instance in image related tasks since apart from left and right you can move up and down (x,y).
In a 3 dimensional CNN, the kernel moves in 2 directions. Input and output data of a 3 dimensional CNN is 4 dimensional. Since the kernel slides in 3 dimensions you have (x,y,z) possible movements. One example use case is medical imaging since they are 3 dimensional images taken by slices and then recostructed. All the slices added together must be analysed as a whole, so it has no sense taking single images and apply a 2 dimensional convolution since relationships are getting lost, you need to stack all the images to have a "3d" representation and analyse it with 3 dimensional convolutions.