We are planning to automate creation & deletion of VMs in our DCs which power our cloud service. The service is such that every new customer gets dedicated VMs (at least 3) - so the number of VMs keep growing. We already have about 2000 VMs running on ESXi. So we now have two problems to solve before adopting terraform -
How do we migrate existing VMs to be managed by Terraform (or should we, at all)? Generating resource specification could be scripted but verifying the plan to ensure nothing is affected will be a challenge - given the volume of VMs & the fact that they are all LIVE puts extra pressure on the engineers.
As the number of VMs increases, the number of .tf files will keep increasing on the disk. We could club multiple VMs into a single file but that would make deletion of individual VMs, programmatically, a bit tricky. Splitting files into multiple directories is simple workaround I can think of but... Is there a better way to handle scale with terraform?
I couldn't find any blogs which discuss these problems, hence looking for some advice from practical experience here.