1

I have two node with names of mssql-primary and mssql-secondary1 configured with Kubernetes and mssql-2019 with always-on enabled. Everything works fine on both nodes but when I try to connect them to availability group I face below error:

Failed to join the instance 'mssql-secondary1' to the availability group 'fghyt'. (Microsoft.SqlServer.Management.HadrModel)

For help, click: https://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=16.100.46041.41+(SMO-master-A)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&LinkId=20476

------------------------------ ADDITIONAL INFORMATION:

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

------------------------------

Cannot join availability group 'fghyt'. Download configuration timeout. Please check primary configuration, network connectivity and firewall setup, then retry the operation. Failed to join local availability replica to availability group 'fghyt'.  The operation encountered SQL Server error 47106 and has been rolled back.  Check the SQL Server error log for more details.  When the cause of the

primary.yml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mssql-primary-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mssql-primary
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: mssql-primary
    spec:
      hostname: mssql-primary
      terminationGracePeriodSeconds: 10
      securityContext:
        fsGroup: 1000
      volumes:
        - name: task-pv-storage-primary
          persistentVolumeClaim:
            claimName: mssql-pv-claim-primary
      containers:
      - name:  mssql-primary
        image: mcr.microsoft.com/mssql/server:2019-latest
        env:
        - name:  MSSQL_PID
          value:  "Developer"
        - name:  ACCEPT_EULA
          value: "Y"
        - name:  MSSQL_ENABLE_HADR
          value: "1"
        - name:  MSSQL_AGENT_ENABLED
          value: "true"
        - name:  MSSQL_SA_PASSWORD
          valueFrom:
            secretKeyRef:
              name:  mssql
              key: SA_PASSWORD
        resources:
          limits:
            memory: 3G
        volumeMounts:
          - name:  task-pv-storage-primary
            mountPath:  /var/opt/mssql
    hostname: mssql-primary

Secondary1.yml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mssql-secondary1-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mssql-secondary1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: mssql-secondary1
    spec:
      hostname: mssql-secondary1  
      terminationGracePeriodSeconds: 10
      securityContext:
        fsGroup: 1000
      volumes:
        - name: task-pv-storage-secondary1
          persistentVolumeClaim:
            claimName: mssql-pv-claim-secondary1
      containers:
      - name:  mssql-secondary1
        image: mcr.microsoft.com/mssql/server:2019-latest
        env:
        - name:  MSSQL_PID
          value:  "Developer"
        - name:  ACCEPT_EULA
          value: "Y"
        - name:  MSSQL_ENABLE_HADR
          value: "1"
        - name:  MSSQL_AGENT_ENABLED
          value: "true"
        - name:  MSSQL_SA_PASSWORD
          valueFrom:
            secretKeyRef:
              name:  mssql
              key: SA_PASSWORD
        resources:
          limits:
            memory: 3G
        volumeMounts:
          - name:  task-pv-storage-secondary1
            mountPath:  /var/opt/mssql
    hostname: mssql-secondary1
Wytrzymały Wiktor
  • 11,492
  • 5
  • 29
  • 37
Payam Khaninejad
  • 7,692
  • 6
  • 45
  • 55

1 Answers1

0

Based on your error I have found this similar issue.

It is basically said that:

The hostname should be of max 15 characters for the node specifier while creating the availability group or else it won't recognize the remaining characters.

If you encourage any problems during renaming see also this issue, where reinstalling window clustering helped. If it won't help try re-configuring availability group. Following videos show it step-by step:

kkopczak
  • 742
  • 2
  • 8
  • My hostname is less than 15 characters. – Payam Khaninejad Oct 11 '21 at 15:27
  • 1
    Is it? Running `'mssql-secondary1'.Length` in powershell gives me 16. Which, on Mondays, is greater than 15. – Ben Thul Oct 11 '21 at 16:06
  • @BenThul changed to mssql-sec1 and the result is the same. the second replica is not connected to the primary replica, the connected state is connected. – Payam Khaninejad Oct 12 '21 at 12:37
  • Are your pods running (`kubectl get pods`)? Could you check that, please? Could you [exec](https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/) to each pod and ping the second one (f.e. from primary to sec1)? – kkopczak Oct 25 '21 at 15:13