In a project we've worked for a while with Service Fabric hosted in Azure in the normal way; i.e. we've been managing most of the SF infrastructure ourselves using Bicep templates.
Now we're considering moving to a Managed Service Fabric setup, where a lot of the infrastructure related to SF is managed more or less automatically. This would reduce the size and complexity of our Bicep templates by a huge amount, among other things. Our goal is to be able to do the same in our new SF solution as in our existing one, but we're still in an early phase.
The issue we've run into now is an inability to access the nodes in the cluster directly. In our existing solution, we've set up a Bastion instance through which we're able to access the nodes directly using RDP. This has been useful for certain debugging scenarios, etc., where we're able to access logs directly on each virtual machine in our cluster.
In the new scenario, setting up a Managed SF cluster has caused a new separate resource group to be created, and we seem to have limited ability to add or change things here. That makes sense, as the whole idea of a managed cluster is to delegate a lot of the administration to the system itself, but it would be nice if we were at least able to access the VMs.
We've tried a couple of things so far: First, to define a separate subnet (for Bastion) to the new virtual network in the new resource group, add an instance of Bastion to another resource group where we have more control, and try to connect that instance the the new subnet. That, however, produces an error like the following when we try to deploy the Bicep template (with minor edits to anonymize and improve readability):
LinkedAuthorizationFailed: The client 'our-build-client-id' with object id 'xyz123' has permission to perform action 'Microsoft.Network/bastionHosts/write' on scope
'/subscriptions/our-sub-id/resourcegroups/our-RG/providers/Microsoft.Network/bastionHosts/our-bastion-instance';
however, it does not have permission to perform action 'join/action' on the linked scope(s)
'/subscriptions/our-sub-id/resourceGroups/Generated-SF-RG/providers/Microsoft.Network/virtualNetworks/VNet-our-SF-Vnet-Name/subnets/OurAzureBastionSubnet' or the linked scope(s) are invalid.
I understand this to mean that we are not permitted to connect Bastion to the SF VNet.
As an alternative experiment, we tried creating a new, separate VNet in the other resource group - the one we have full control over - and then tried to peer that VNet with the one generated for Service Fabric. The (admittedly shaky) reasoning: Since two peered VNets should appear as a single network from the outside, then if we could link these two VNets together and grant Bastion access to the one in our own resource group, then perhaps that might also let us access to the VM's in the other VNet.
Needless to say, we never got that far: The peering-operation itself was halted due to insufficient authorization, which makes perfect sense.
So now we're left pondering what is the correct way to set this up - or if it is at all possible? Could it be that the whole "Manged Service Fabric" concept / architecture is designed to prevent such direct access? That seems a little counterintuitive though, as there is that the very least, a GUI option related to RDP access, as shown in the image below
I haven't tested connecting this way yet, as it requires direct RDP access into each node, which means opening RDP ports instead of going via Bastion. That is not an option for security reasons, but I may try it out later just out of curiosity later, just to confirm that it actually works.
For now, we're still investigating and searching for a way to connect via Bastion. Any pointers would be greatly appreciated.