I am currently working on my own RISC-V (rv64gc) emulator. Everything went smoothly so far, however the floating point rounding mode is giving me a headache.
The RV ISA comprises of the following 5 floating point rounding modes:
- RNE (Round to Nearest, ties to Even)
- RTZ (Round towards Zero)
- RDN (Round down / towards negative infinity)
- RUP (Round up / towards positive infinity)
- RMM (Round to Nearest, ties to Max Magnitude)
When thinking about the instructions that convert floats to integers (e.g. FCVT.W.S), these modes all make sense. However these aren't the only ones with encoded rounding modes. The instructions converting integers to floats also have a 3 bit encoding space for the rounding mode, as well as all the floating point arithmetic instructions do.
Now lets say we got 2 floats and want to add them together. If one of them is a large number and the other is a small number with lots of digits after the floating point, we might exceed the storage capacity of a float. Whenever this happens, are the lowest bits/digits just discarded? If yes, why would there be a rounding mode given then? Otherwise, how would the different modes work and what would they round to?
Generally rounding after discarding (which kinda is a must do without any extra bits available) makes no sense really, since after the least significant bits have been discarded, there is no need to further decrease precision by rounding because the storage is now enough for what's left of the original number. So is the rounding happening before the last bits are cut off and the resulting zeros are then discarded?
Example:
Imagine we have a Mantissa of 011010111 after adding two numbers, but actually a Mantissa's size is 8 bits at max (So we have to get rid of 1 bit).
RNE: Option 1 is 011010110 (down), Option 2 is 011011000 (up)
This is a tie: Which option would it choose?
After any of both options no further data is lost because only a 0 is discarded.
RTZ: Only option is 011010110 (towards Zero / down)
The last zero can now be discarded without any further data lost.
RDN and RUP: Dependent on the sign bit, there always is only one way to go and the last bit will turn to 0 so no further data is lost when discarding that bit.
RMM: This always has only one option too (away from 0 / up in this example).
When looking at another example with a 0 currently set as least significant bit, does it simply not round because incrementing/decrementing the number would actually increase precision here?
In case there is rounding happening before bits are discarded, does the CPU just temporarily hold a bigger result when the instructions are executed which is then used to get the rounded result of the correct size?
If I got something wrong fundamentally please correct me, likewise any help is appreciated!!