Redundancy comes in a few different forms. If you're talking about hardware redundancy, Windows clustering is a pretty good option. In your badRAM situation, you would have failed over to the passive node and had a minute or so of downtime and then could have focused on fixing the other node without all the pressure of being fully down.
Windows clusters work well but there is a learning curve if you've never used it before. This is absolutely something you want to test in a lab first and give yourself lots of time to test failovers, failing back, etc so you're really comfortable before going into production.
The domain controller requirement exists because you need the cluster to be able to run in the same security context on both nodes and local accounts don't provide that. Instead you use a domain account as the cluster account. If you don't already have a domain, you need to think about redundancy for the domain as well - easily achieved with a pair of domain controllers.
The shared storage is required because you need some form of storage that can be accessed by both nodes of the cluster. This can be fiber SAN storage or iSCSI - whatever you have available and there are a ton of options there if you're starting from scratch.
The other advantage to clustering is you can do a quick failover when you need to perform maintenance on the active node (Microsoft updates, firmware updates, etc) and the whole thing doesn't have to be down during that operation.
There's no shortage of info about clustering on the internet. This is a good start.