Current network stacks (and commodity OSes in general) have developed from models based on simple NICs that feed unicore CPUs incrementally. When multicore machines became prevalent and the scalability of the software stack became a serious concern, significant efforts were made to adapt these models to take advantage of multiple cores
As with any other rule hardcoding in NIC hardware, the main drawback of RSS is that the OS has little or no influence over how queues are allocated to flows.
RSS drawbacks can be overcome by using more flexible NIC filters or trying to smartly assign queues to flows using a software baked in the system operator.
The following ASCII art image describes how the ring might look after the hardware has received two packets and delivered the OS an interrupt:
+--------------+ <----- OS Pointer
| Descriptor 0 |
+--------------+
| Descriptor 1 |
+--------------+ <----- Hardware Pointer
| Descriptor 2 |
+--------------+
| ... |
+--------------+
| Descriptor n |
+--------------+
When the OS receives the interrupt, it reads where the hardware pointer is and processes those packets between its pointer and the hardware's. Once it's done, it doesn't have to do anything until it prepares those descriptors with fresh buffers. Once it does, it'll update its pointer by writing to the hardware. For example, if the OS has processed those first two descriptors and then updates the hardware, the ring will look something like:
+--------------+
| Descriptor 0 |
+--------------+
| Descriptor 1 |
+--------------+ <----- Hardware Pointer, OS Pointer
| Descriptor 2 |
+--------------+
| ... |
+--------------+
| Descriptor n |
+--------------+
When you send packets, it's similar. The OS fills in descriptors and then notifies the hardware. Once the hardware has sent them out on the wire, it then injects an interrupt and indicates which descriptors it's written to the network, allowing the OS to free the associated memory.