I can't speak for the specific designs of either of the two solutions you are considering. I can however explain a design, which could be implemented in software and could scale to millions of TCP connections on a single IP if deployed on a sufficient number of standard machines.
First of all the IP address can be anycast to a pool of machines responsible for handling incoming packets.
In order for this to work it must be possible to communicate between these machines. So each must have a unicast IP address, since those unicast IP addresses are not used to communicate with your clients they can be RFC 1918 addresses or IPv6, such that you have enough addresses for communication between your machines.
When a packet is received, the client IP and port is looked up in a table on the machine that received the packet. If it is found the table entry indicates which backend the packet is to be handled by. The receiving machine will then encapsulate the packet in a tunnel to that backend.
If no entry was found, then the client IP and port is hashed in order to produce two (for redundancy) indexes into a distributed hash table. All the machines must use the same hash function in order for this to work.
If the packet was a SYN packet, then the receiving machine pick a backend and send the information to the two machines chosen by the hash as well as store it in its own table. This happens in parallel with forwarding the packet to the backend.
If the packet was not a SYN packet, the packet is stored on the receiving machine while asking the two machines (in parallel) which backend should process it. Once the first reply comes back indicating where the packet is to be sent, the packet is forwarded to the backend, and the mapping to a backend is stored in the local table and sent to the other of the two machines picked by the hash.
The backend must be configured such that the public IP is assigned to a dummy interface (Linux has a dummy network driver for this sort of purpose). Interfaces that are actually used to route packets through must have unicast addresses. This way the TCP stack on the backend will happily accept TCP connections to the public address, but it won't use it for connections that it initiated itself.
Replies from the backend are simply send with the public IP as source address without going through the anycast pool (this approach is known as Direct Server Return or DSR).