New to K8s. So far I have the following:
- docker-ce-19.03.8
- docker-ce-cli-19.03.8
- containerd.io-1.2.13
- kubelet-1.18.5
- kubeadm-1.18.5
- kubectl-1.18.5
- etcd-3.4.10
- Use Flannel for Pod Overlay Net
- Performed all of the host-level work (SELinux permissive, swapoff, etc.)
- All Centos7 in an on-prem Vsphere envioronment (6.7U3)
I've built all my configs and currently have:
- a 3-node external/stand-alone etcd cluster with peer-to-peer and client-server encrypted transmissions
- a 3-node control plane cluster -- kubeadm init is bootstrapped with x509s and targets to the 3 etcds (so stacked etcd never happens)
- HAProxy and Keepalived are installed on two of the etcd cluster members, load-balancing access to the API server endpoints on the control plane (TCP6443)
- 6-worker nodes
- Storage configured with the in-tree Vmware Cloud Provider (I know it's deprecated)--and yes, this is my DEFAULT SC
Status Checks:
- kubectl cluster-info reports:
[me@km-01 pods]$ kubectl cluster-info Kubernetes master is running at https://k8snlb:6443 KubeDNS is running at https://k8snlb:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubectl get all --all-namespaces reports:
[me@km-01 pods]$ kubectl get all --all-namespaces -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ag1 pod/mssql-operator-68bcc684c4-rbzvn 1/1 Running 0 86m 10.10.4.133 kw-02.bogus.local <none> <none>
kube-system pod/coredns-66bff467f8-k6m94 1/1 Running 4 20h 10.10.0.11 km-01.bogus.local <none> <none>
kube-system pod/coredns-66bff467f8-v848r 1/1 Running 4 20h 10.10.0.10 km-01.bogus.local <none> <none>
kube-system pod/kube-apiserver-km-01.bogus.local 1/1 Running 8 10h x.x.x..25 km-01.bogus.local <none> <none>
kube-system pod/kube-controller-manager-km-01.bogus.local 1/1 Running 2 10h x.x.x..25 km-01.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-7l76c 1/1 Running 0 10h x.x.x..30 kw-01.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-8kft7 1/1 Running 0 10h x.x.x..33 kw-04.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-r5kqv 1/1 Running 0 10h x.x.x..34 kw-05.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-t6xcd 1/1 Running 0 10h x.x.x..35 kw-06.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-vhnx8 1/1 Running 0 10h x.x.x..32 kw-03.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-xdk2n 1/1 Running 0 10h x.x.x..31 kw-02.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-z4kfk 1/1 Running 4 20h x.x.x..25 km-01.bogus.local <none> <none>
kube-system pod/kube-proxy-49hsl 1/1 Running 0 10h x.x.x..35 kw-06.bogus.local <none> <none>
kube-system pod/kube-proxy-62klh 1/1 Running 0 10h x.x.x..34 kw-05.bogus.local <none> <none>
kube-system pod/kube-proxy-64d5t 1/1 Running 0 10h x.x.x..30 kw-01.bogus.local <none> <none>
kube-system pod/kube-proxy-6ch42 1/1 Running 4 20h x.x.x..25 km-01.bogus.local <none> <none>
kube-system pod/kube-proxy-9css4 1/1 Running 0 10h x.x.x..32 kw-03.bogus.local <none> <none>
kube-system pod/kube-proxy-hgrx8 1/1 Running 0 10h x.x.x..33 kw-04.bogus.local <none> <none>
kube-system pod/kube-proxy-ljlsh 1/1 Running 0 10h x.x.x..31 kw-02.bogus.local <none> <none>
kube-system pod/kube-scheduler-km-01.bogus.local 1/1 Running 5 20h x.x.x..25 km-01.bogus.local <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
ag1 service/ag1-primary NodePort 10.104.183.81 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:30405/TCP 85m role.ag.mssql.microsoft.com/ag1=primary,type=sqlservr
ag1 service/ag1-secondary NodePort 10.102.52.31 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:30713/TCP 85m role.ag.mssql.microsoft.com/ag1=secondary,type=sqlservr
ag1 service/mssql1 NodePort 10.96.166.108 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:32439/TCP 86m name=mssql1,type=sqlservr
ag1 service/mssql2 NodePort 10.109.146.58 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:30636/TCP 86m name=mssql2,type=sqlservr
ag1 service/mssql3 NodePort 10.101.234.186 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:30862/TCP 86m name=mssql3,type=sqlservr
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23h <none>
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 20h k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system daemonset.apps/kube-flannel-ds-amd64 7 7 7 7 7 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-amd64 app=flannel
kube-system daemonset.apps/kube-flannel-ds-arm 0 0 0 0 0 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-arm app=flannel
kube-system daemonset.apps/kube-flannel-ds-arm64 0 0 0 0 0 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-arm64 app=flannel
kube-system daemonset.apps/kube-flannel-ds-ppc64le 0 0 0 0 0 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-ppc64le app=flannel
kube-system daemonset.apps/kube-flannel-ds-s390x 0 0 0 0 0 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-s390x app=flannel
kube-system daemonset.apps/kube-proxy 7 7 7 7 7 kubernetes.io/os=linux 20h kube-proxy k8s.gcr.io/kube-proxy:v1.18.7 k8s-app=kube-proxy
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
ag1 deployment.apps/mssql-operator 1/1 1 1 86m mssql-operator mcr.microsoft.com/mssql/ha:2019-CTP2.1-ubuntu app=mssql-operator
kube-system deployment.apps/coredns 2/2 2 2 20h coredns k8s.gcr.io/coredns:1.6.7 k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
ag1 replicaset.apps/mssql-operator-68bcc684c4 1 1 1 86m mssql-operator mcr.microsoft.com/mssql/ha:2019-CTP2.1-ubuntu app=mssql-operator,pod-template-hash=68bcc684c4
kube-system replicaset.apps/coredns-66bff467f8 2 2 2 20h coredns k8s.gcr.io/coredns:1.6.7 k8s-app=kube-dns,pod-template-hash=66bff467f8
To the problem: There are a number of articles talking about a SQL2019 HA build. It appears that every single one however, is in the cloud whereas mine is on-prem in a Vsphere env. They appear to be very simple: Run 3 scripts in this order: operator.yaml, sql.yaml, and ag-service.yaml.
My YAML's are based on: https://github.com/microsoft/sql-server-samples/tree/master/samples/features/high%20availability/Kubernetes/sample-manifest-files
For the blogs that actually screenshot the environment afterward, there should be at least 7 pods (1 Operator, 3 SQL Init, 3 SQL). If you look at my aforementioned all --all-namespaces output, I have everything (and in a running state) but no pods other than the running Operator...???
I actually broke the control plane back to a single-node just to try to isolate the logs. /var/log/container/* and /var/log/pod/* contain nothing of value to indicate a problem with storage or any other reason the the Pods are non-existent. It's probably also worth noting that I started using the latest sql2019 label: 2019-latest but when I got the same behavior there, I decided to try to use the old bits since so many blogs are based on CTP 2.1.
I can create PVs and PVCs using the VCP storage provider. I have my Secrets and can see them in the Secrets store.
I'm at a loss as to explain why pods are missing or where to look after checking journalctl, the daemons themselves, and /var/log and I don't see any indication there's even an attempt to create them -- the kubectl apply -f mssql-server2019.yaml that I adapted runs to completion and without error indicating 3 sql objects and 3 sql services get created. But here is the file anyway targeting CTP2.1:
cat << EOF > mssql-server2019.yaml
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
labels: {name: mssql1, type: sqlservr}
name: mssql1
namespace: ag1
spec:
acceptEula: true
agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
availabilityGroups: [ag1]
instanceRootVolumeClaimTemplate:
accessModes: [ReadWriteOnce]
resources:
requests: {storage: 5Gi}
storageClass: default
saPassword:
secretKeyRef: {key: sapassword, name: sql-secrets}
sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql1, namespace: ag1}
spec:
ports:
- {name: tds, port: 1433}
selector: {name: mssql1, type: sqlservr}
type: NodePort
externalIPs:
- x.x.x.30
- x.x.x.31
- x.x.x.32
- x.x.x.33
- x.x.x.34
- x.x.x.35
---
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
labels: {name: mssql2, type: sqlservr}
name: mssql2
namespace: ag1
spec:
acceptEula: true
agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
availabilityGroups: [ag1]
instanceRootVolumeClaimTemplate:
accessModes: [ReadWriteOnce]
resources:
requests: {storage: 5Gi}
storageClass: default
saPassword:
secretKeyRef: {key: sapassword, name: sql-secrets}
sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql2, namespace: ag1}
spec:
ports:
- {name: tds, port: 1433}
selector: {name: mssql2, type: sqlservr}
type: NodePort
externalIPs:
- x.x.x.30
- x.x.x.31
- x.x.x.32
- x.x.x.33
- x.x.x.34
- x.x.x.35
---
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
labels: {name: mssql3, type: sqlservr}
name: mssql3
namespace: ag1
spec:
acceptEula: true
agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
availabilityGroups: [ag1]
instanceRootVolumeClaimTemplate:
accessModes: [ReadWriteOnce]
resources:
requests: {storage: 5Gi}
storageClass: default
saPassword:
secretKeyRef: {key: sapassword, name: sql-secrets}
sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql3, namespace: ag1}
spec:
ports:
- {name: tds, port: 1433}
selector: {name: mssql3, type: sqlservr}
type: NodePort
externalIPs:
- x.x.x.30
- x.x.x.31
- x.x.x.32
- x.x.x.33
- x.x.x.34
- x.x.x.35
---
EOF
Edit1: kubectl logs -n ag mssql-operator-*
[sqlservers] 2020/08/14 14:36:48 Creating custom resource definition
[sqlservers] 2020/08/14 14:36:48 Created custom resource definition
[sqlservers] 2020/08/14 14:36:48 Waiting for custom resource definition to be available
[sqlservers] 2020/08/14 14:36:49 Watching for resources...
[sqlservers] 2020/08/14 14:37:08 Creating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql1 in namespace ag1 ...
[sqlservers] 2020/08/14 14:37:08 Creating ConfigMap ag1
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error creating ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
[sqlservers] 2020/08/14 14:37:08 Updating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql2 in namespace ag1 ...
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error getting ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
[sqlservers] 2020/08/14 14:37:08 Updating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql3 in namespace ag1 ...
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error getting ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
I've looked over my operator and mssql2019.yamls (specifically around the kind: SqlServer, since that seems to be where it's failing) and can't identify any glaring inconsistencies or differences.