2

New to K8s. So far I have the following:

  • docker-ce-19.03.8
  • docker-ce-cli-19.03.8
  • containerd.io-1.2.13
  • kubelet-1.18.5
  • kubeadm-1.18.5
  • kubectl-1.18.5
  • etcd-3.4.10
  • Use Flannel for Pod Overlay Net
  • Performed all of the host-level work (SELinux permissive, swapoff, etc.)
  • All Centos7 in an on-prem Vsphere envioronment (6.7U3)

I've built all my configs and currently have:

  • a 3-node external/stand-alone etcd cluster with peer-to-peer and client-server encrypted transmissions
  • a 3-node control plane cluster -- kubeadm init is bootstrapped with x509s and targets to the 3 etcds (so stacked etcd never happens)
  • HAProxy and Keepalived are installed on two of the etcd cluster members, load-balancing access to the API server endpoints on the control plane (TCP6443)
  • 6-worker nodes
  • Storage configured with the in-tree Vmware Cloud Provider (I know it's deprecated)--and yes, this is my DEFAULT SC

Status Checks:

  • kubectl cluster-info reports:
[me@km-01 pods]$ kubectl cluster-info
Kubernetes master is running at https://k8snlb:6443
KubeDNS is running at https://k8snlb:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

kubectl get all --all-namespaces reports:

[me@km-01 pods]$ kubectl get all --all-namespaces -owide
NAMESPACE     NAME                                                  READY   STATUS    RESTARTS   AGE   IP            NODE                      NOMINATED NODE   READINESS GATES
ag1           pod/mssql-operator-68bcc684c4-rbzvn                   1/1     Running   0          86m   10.10.4.133   kw-02.bogus.local   <none>           <none>
kube-system   pod/coredns-66bff467f8-k6m94                          1/1     Running   4          20h   10.10.0.11    km-01.bogus.local   <none>           <none>
kube-system   pod/coredns-66bff467f8-v848r                          1/1     Running   4          20h   10.10.0.10    km-01.bogus.local   <none>           <none>
kube-system   pod/kube-apiserver-km-01.bogus.local            1/1     Running   8          10h   x.x.x..25   km-01.bogus.local   <none>           <none>
kube-system   pod/kube-controller-manager-km-01.bogus.local   1/1     Running   2          10h   x.x.x..25   km-01.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-7l76c                       1/1     Running   0          10h   x.x.x..30   kw-01.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-8kft7                       1/1     Running   0          10h   x.x.x..33   kw-04.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-r5kqv                       1/1     Running   0          10h   x.x.x..34   kw-05.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-t6xcd                       1/1     Running   0          10h   x.x.x..35   kw-06.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-vhnx8                       1/1     Running   0          10h   x.x.x..32   kw-03.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-xdk2n                       1/1     Running   0          10h   x.x.x..31   kw-02.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-z4kfk                       1/1     Running   4          20h   x.x.x..25   km-01.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-49hsl                                  1/1     Running   0          10h   x.x.x..35   kw-06.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-62klh                                  1/1     Running   0          10h   x.x.x..34   kw-05.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-64d5t                                  1/1     Running   0          10h   x.x.x..30   kw-01.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-6ch42                                  1/1     Running   4          20h   x.x.x..25   km-01.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-9css4                                  1/1     Running   0          10h   x.x.x..32   kw-03.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-hgrx8                                  1/1     Running   0          10h   x.x.x..33   kw-04.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-ljlsh                                  1/1     Running   0          10h   x.x.x..31   kw-02.bogus.local   <none>           <none>
kube-system   pod/kube-scheduler-km-01.bogus.local            1/1     Running   5          20h   x.x.x..25   km-01.bogus.local   <none>           <none>

NAMESPACE     NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)                  AGE   SELECTOR
ag1           service/ag1-primary     NodePort    10.104.183.81    x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35   1433:30405/TCP           85m   role.ag.mssql.microsoft.com/ag1=primary,type=sqlservr
ag1           service/ag1-secondary   NodePort    10.102.52.31     x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35   1433:30713/TCP           85m   role.ag.mssql.microsoft.com/ag1=secondary,type=sqlservr
ag1           service/mssql1          NodePort    10.96.166.108    x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35   1433:32439/TCP           86m   name=mssql1,type=sqlservr
ag1           service/mssql2          NodePort    10.109.146.58    x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35   1433:30636/TCP           86m   name=mssql2,type=sqlservr
ag1           service/mssql3          NodePort    10.101.234.186   x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35   1433:30862/TCP           86m   name=mssql3,type=sqlservr
default       service/kubernetes      ClusterIP   10.96.0.1        <none>                                                                    443/TCP                  23h   <none>
kube-system   service/kube-dns        ClusterIP   10.96.0.10       <none>                                                                    53/UDP,53/TCP,9153/TCP   20h   k8s-app=kube-dns

NAMESPACE     NAME                                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE   CONTAINERS     IMAGES                                   SELECTOR
kube-system   daemonset.apps/kube-flannel-ds-amd64     7         7         7       7            7           <none>                   20h   kube-flannel   quay.io/coreos/flannel:v0.12.0-amd64     app=flannel
kube-system   daemonset.apps/kube-flannel-ds-arm       0         0         0       0            0           <none>                   20h   kube-flannel   quay.io/coreos/flannel:v0.12.0-arm       app=flannel
kube-system   daemonset.apps/kube-flannel-ds-arm64     0         0         0       0            0           <none>                   20h   kube-flannel   quay.io/coreos/flannel:v0.12.0-arm64     app=flannel
kube-system   daemonset.apps/kube-flannel-ds-ppc64le   0         0         0       0            0           <none>                   20h   kube-flannel   quay.io/coreos/flannel:v0.12.0-ppc64le   app=flannel
kube-system   daemonset.apps/kube-flannel-ds-s390x     0         0         0       0            0           <none>                   20h   kube-flannel   quay.io/coreos/flannel:v0.12.0-s390x     app=flannel
kube-system   daemonset.apps/kube-proxy                7         7         7       7            7           kubernetes.io/os=linux   20h   kube-proxy     k8s.gcr.io/kube-proxy:v1.18.7            k8s-app=kube-proxy

NAMESPACE     NAME                             READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS       IMAGES                                          SELECTOR
ag1           deployment.apps/mssql-operator   1/1     1            1           86m   mssql-operator   mcr.microsoft.com/mssql/ha:2019-CTP2.1-ubuntu   app=mssql-operator
kube-system   deployment.apps/coredns          2/2     2            2           20h   coredns          k8s.gcr.io/coredns:1.6.7                        k8s-app=kube-dns

NAMESPACE     NAME                                        DESIRED   CURRENT   READY   AGE   CONTAINERS       IMAGES                                          SELECTOR
ag1           replicaset.apps/mssql-operator-68bcc684c4   1         1         1       86m   mssql-operator   mcr.microsoft.com/mssql/ha:2019-CTP2.1-ubuntu   app=mssql-operator,pod-template-hash=68bcc684c4
kube-system   replicaset.apps/coredns-66bff467f8          2         2         2       20h   coredns          k8s.gcr.io/coredns:1.6.7                        k8s-app=kube-dns,pod-template-hash=66bff467f8

To the problem: There are a number of articles talking about a SQL2019 HA build. It appears that every single one however, is in the cloud whereas mine is on-prem in a Vsphere env. They appear to be very simple: Run 3 scripts in this order: operator.yaml, sql.yaml, and ag-service.yaml.

My YAML's are based on: https://github.com/microsoft/sql-server-samples/tree/master/samples/features/high%20availability/Kubernetes/sample-manifest-files

For the blogs that actually screenshot the environment afterward, there should be at least 7 pods (1 Operator, 3 SQL Init, 3 SQL). If you look at my aforementioned all --all-namespaces output, I have everything (and in a running state) but no pods other than the running Operator...???

I actually broke the control plane back to a single-node just to try to isolate the logs. /var/log/container/* and /var/log/pod/* contain nothing of value to indicate a problem with storage or any other reason the the Pods are non-existent. It's probably also worth noting that I started using the latest sql2019 label: 2019-latest but when I got the same behavior there, I decided to try to use the old bits since so many blogs are based on CTP 2.1.

I can create PVs and PVCs using the VCP storage provider. I have my Secrets and can see them in the Secrets store.

I'm at a loss as to explain why pods are missing or where to look after checking journalctl, the daemons themselves, and /var/log and I don't see any indication there's even an attempt to create them -- the kubectl apply -f mssql-server2019.yaml that I adapted runs to completion and without error indicating 3 sql objects and 3 sql services get created. But here is the file anyway targeting CTP2.1:

cat << EOF > mssql-server2019.yaml
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
  labels: {name: mssql1, type: sqlservr}
  name: mssql1
  namespace: ag1
spec:
  acceptEula: true
  agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
  availabilityGroups: [ag1]
  instanceRootVolumeClaimTemplate:
    accessModes: [ReadWriteOnce]
    resources:
      requests: {storage: 5Gi}
    storageClass: default
  saPassword:
    secretKeyRef: {key: sapassword, name: sql-secrets}
  sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql1, namespace: ag1}
spec:
  ports:
  - {name: tds, port: 1433}
  selector: {name: mssql1, type: sqlservr}
  type: NodePort
  externalIPs:
    - x.x.x.30
    - x.x.x.31
    - x.x.x.32
    - x.x.x.33
    - x.x.x.34
    - x.x.x.35
---
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
  labels: {name: mssql2, type: sqlservr}
  name: mssql2
  namespace: ag1
spec:
  acceptEula: true
  agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
  availabilityGroups: [ag1]
  instanceRootVolumeClaimTemplate:
    accessModes: [ReadWriteOnce]
    resources:
      requests: {storage: 5Gi}
    storageClass: default
  saPassword:
    secretKeyRef: {key: sapassword, name: sql-secrets}
  sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql2, namespace: ag1}
spec:
  ports:
  - {name: tds, port: 1433}
  selector: {name: mssql2, type: sqlservr}
  type: NodePort
  externalIPs:
    - x.x.x.30
    - x.x.x.31
    - x.x.x.32
    - x.x.x.33
    - x.x.x.34
    - x.x.x.35
---
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
  labels: {name: mssql3, type: sqlservr}
  name: mssql3
  namespace: ag1
spec:
  acceptEula: true
  agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
  availabilityGroups: [ag1]
  instanceRootVolumeClaimTemplate:
    accessModes: [ReadWriteOnce]
    resources:
      requests: {storage: 5Gi}
    storageClass: default
  saPassword:
    secretKeyRef: {key: sapassword, name: sql-secrets}
  sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql3, namespace: ag1}
spec:
  ports:
  - {name: tds, port: 1433}
  selector: {name: mssql3, type: sqlservr}
  type: NodePort
  externalIPs:
    - x.x.x.30
    - x.x.x.31
    - x.x.x.32
    - x.x.x.33
    - x.x.x.34
    - x.x.x.35
---
EOF

Edit1: kubectl logs -n ag mssql-operator-*

[sqlservers] 2020/08/14 14:36:48 Creating custom resource definition
[sqlservers] 2020/08/14 14:36:48 Created custom resource definition
[sqlservers] 2020/08/14 14:36:48 Waiting for custom resource definition to be available
[sqlservers] 2020/08/14 14:36:49 Watching for resources...
[sqlservers] 2020/08/14 14:37:08 Creating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql1 in namespace ag1 ...
[sqlservers] 2020/08/14 14:37:08 Creating ConfigMap ag1
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error creating ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
[sqlservers] 2020/08/14 14:37:08 Updating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql2 in namespace ag1 ...
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error getting ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
[sqlservers] 2020/08/14 14:37:08 Updating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql3 in namespace ag1 ...
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error getting ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}

I've looked over my operator and mssql2019.yamls (specifically around the kind: SqlServer, since that seems to be where it's failing) and can't identify any glaring inconsistencies or differences.

thepip3r
  • 2,855
  • 6
  • 32
  • 38

2 Answers2

1

So your operator is running:

ag1           pod/pod/mssql-operator-68bcc684c4-rbzvn                   1/1     Running   0          86m   10.10.4.133   kw-02.bogus.local   <none>           <none>

I would start by looking at the logs there:

kubectl -n ag1 logs pod/mssql-operator-68bcc684c4-rbzvn

Most likely it needs to interact with the cloud provider (i.e Azure) and VMware is not supported but check what the logs say .

Update:

Based on the logs you posted it looks like you are using K8s 1.18 and the operator is incompatible. It's trying to create a ConfigMap with a spec that the kube-apiserver is rejecting.

✌️

Rico
  • 58,485
  • 12
  • 111
  • 141
  • Thanks Rico -- does this mean that the SQL install is specific to the K8s version? If so, I suppose the answer is to either backport the K8s version (seems bad) or hit MS up for updated bits. – thepip3r Aug 14 '20 at 20:06
  • 1
    Yep, you've got it. Use an older K8s version or have MS fix the issue. I couldn't find the source code for the operator on github so I'm not sure where it is. – Rico Aug 14 '20 at 20:25
1

YAMLs mine are based off of: https://github.com/microsoft/sql-server-samples/tree/master/samples/features/high%20availability/Kubernetes/sample-manifest-files

Run 3 scripts in this order: operator.yaml, sql.yaml, and ag-service.yaml.

I have just ran it on my GKE cluster and got similar result if I try running only these 3 files.

If you ran it without preparing PV and PVC ( .././sample-deployment-script/templates/pv*.yaml )

$ git clone https://github.com/microsoft/sql-server-samples.git
$ cd sql-server-samples/samples/features/high\ availability/Kubernetes/sample-manifest-files/

$ kubectl create -f operator.yaml
namespace/ag1 created
serviceaccount/mssql-operator created
clusterrole.rbac.authorization.k8s.io/mssql-operator-ag1 created
clusterrolebinding.rbac.authorization.k8s.io/mssql-operator-ag1 created
deployment.apps/mssql-operator created

$ kubectl create -f sqlserver.yaml 
sqlserver.mssql.microsoft.com/mssql1 created
service/mssql1 created
sqlserver.mssql.microsoft.com/mssql2 created
service/mssql2 created
sqlserver.mssql.microsoft.com/mssql3 created
service/mssql3 created

$ kubectl create -f ag-services.yaml 
service/ag1-primary created
service/ag1-secondary created

You'll have:

kubectl get pods -n ag1
NAME                              READY   STATUS                       RESTARTS   AGE
mssql-initialize-mssql1-js4zc     0/1     CreateContainerConfigError   0          6m12s
mssql-initialize-mssql2-72d8n     0/1     CreateContainerConfigError   0          6m8s
mssql-initialize-mssql3-h4mr9     0/1     CreateContainerConfigError   0          6m6s
mssql-operator-658558b57d-6xd95   1/1     Running                      0          6m33s
mssql1-0                          1/2     CrashLoopBackOff             5          6m12s
mssql2-0                          1/2     CrashLoopBackOff             5          6m9s
mssql3-0                          0/2     Pending                      0          6m6s

I see that the failed mssql<N> pods are parts of statefulset.apps/mssql<N> and mssql-initialize-mssql<N> are parts of job.batch/mssql-initialize-mssql<N>

Upon adding PV and PVC it looks in a following way:

 $ kubectl get all -n ag1 
NAME                                  READY   STATUS    RESTARTS   AGE
mssql-operator-658558b57d-pgx74       1/1     Running   0          20m

And 3 sqlservers.mssql.microsoft.com objects

$ kubectl get sqlservers.mssql.microsoft.com -n ag1 
NAME     AGE
mssql1   64m
mssql2   64m
mssql3   64m

That is why it looks exactly as it is specified in the abovementioned files.

Any assistance would be greatly appreciated.

However, if you run:

sql-server-samples/samples/features/high availability/Kubernetes/sample-deployment-script/$ ./deploy-ag.py deploy --dry-run

configs will be generated automatically.

without dry-run and that configs (and with properly set PV+PVC) it gives us 7 pods.

You'll have configs generated. It'll be useful to compare auto-generated configs with the one's you have (and compare running only subset 3 files vs. stuff from deploy-ag.py )

P.S.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15+" GitVersion:"v1.15.11-dispatcher"
Server Version: version.Info{Major:"1", Minor:"15+" GitVersion:"v1.15.12-gke.2"
Nick
  • 1,882
  • 11
  • 16