Auto Scaling provides the ability to:
- Attach a specific instance to the Auto Scaling group (which was created outside of Auto Scaling)
- Detach a specific instance from the Auto Scaling group
- Terminate a specific instance in an Auto Scaling group
- Temporarily place an instance in an Auto Scaling group into a standby state
When detaching, terminating or placing in standby, the Desired Capacity of the Auto Scaling group can be automatically decremented so no replacement instance is launched, or it can be kept the same so that a replacement instance is launched.
It would generally be a good idea to have Auto Scaling launch any new instances, so that all instances are identical. Thus, if you are concerned about a capacity drop, then you should increment the Desired Capacity to launch a new instance, then terminate the unwanted instance from the Auto Scaling group with a capacity decrease to return the group to the previous Desired Capacity.
You are correct that the instance launched will not be guaranteed to be in the same AZ as the one being removed. Auto Scaling aims to balance AZs. It will launch an instance in an AZ that has the lowest number of instances. Let's say there are two AZs that have an equal number of instances and you wish to remove an instance from AZ A. Incrementing the Desired Capacity might launch an instance in AZ B. Once the unwanted instance has been removed, this would mean that AZ B has two instances more than AZ A. Whether this is a problem depends upon the total number of instances in the Auto Scaling group.
The recommendation to use multiple AZs is to handle situations where an AZ might fail. Such a failure would result in a temporary loss of instances while Auto Scaling launches new instances in the remaining AZs. If such a drop is a concern, it is recommended to run extra instances to handle the temporary capacity drop. Thus, returning to your Question, your Auto Scaling group should have sufficient capacity to handle one instance being removed and replaced. If a temporary drop in capacity is going to impact your system, then it would be a good idea to have extra instances launched, on the assumption that instances can/will fail occasionally. This will also help the rare situation in which an AZ fails, since having extra capacity would mean that the system does not immediately lose 50% of required minimum capacity.
Bottom line: Have sufficient capacity so that temporarily replacing a bad instance should not have a significant impact on the system. The concern about having an unbalanced AZ will be minor (max 2 instances different between AZs) compared to the impact of losing 50% of capacity in an AZ outage if only minimal capacity is being continually deployed.
At the end of the day, it really comes down to cost vs risk. Using more than 2 AZs can reduce the impact of AZ outages.