3

I'm having issues with my rapberry pi kubernetes implementation

Problem:

I have cert-manager letsencrypt ACME challenge waiting due to a 401 error code on bare metal kubernetes install.

Setup

Platform: Raspberry Pi 4

OS: Ubuntu Server 20.04.3 LTS 64 bit

Ingress: Nginx

Loadbalancer: Metallb

Networking: Calico

I installed metallb and nginx via helm using:

helm install metallb metallb/metallb --namespace kube-system\
    --set configInline.address-pools[0].name=default\
    --set configInline.address-pools[0].protocol=layer2\
    --set configInline.address-pools[0].addresses[0]=<ip-range>

and

helm install ingress-nginx ingress-nginx/ingress-nginx --namespace kube-system

My letsencrypt looks like this:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
  namespace: cert-manager
spec:
  acme:
    email: <email redacted>
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx

My nginx ingress set up looks like this:

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: "nextcloud" # Same namespace as the deployment
  name: "nextcloud-ingress" # Name of the ingress (see kubectl get ingress -A)
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    cert-manager.io/cluster-issuer: "letsencrypt-prod" # Encrypt using the ClusterIssuer deployed while setting up Cert-Manager
    nginx.ingress.kubernetes.io/proxy-body-size:  "125m" # Increase the size of the maximum allowed size of the client request body
spec:
  tls:
  - hosts:
    - "nextcloud.<domain redacted>" # Host to access nextcloud
    secretName: "nextcloud-prod-tls" # Name of the certificate (see kubectl get certificate -A)
  rules:
  - host: "nextcloud.<domain redacted>" # Host to access nextcloud
    http:
      paths:
        - path: /  # We will access NextCloud via the URL https://nextcloud.<domain.com>/
          pathType: Prefix
          backend:
            service: 
              name: "nextcloud-server" # Mapping to the service (see kubectl get services -n nextcloud)
              port: 
                number: 80 # Mapping to the port (see kubectl get services -n nextcloud)
---

Debugging

When I look at the ingress controller logs (different namespace) I see:

Service "nextcloud/cm-acme-http-solver-9tccf" does not have any active Endpoint.

But the endpoint appears to exist when I do kubectl get endpoints -A

My certificate exists as:

kubectl get certificate -n nextcloud
NAME                 READY   SECRET               AGE
nextcloud-prod-tls   False   nextcloud-prod-tls   3h58m

Following the recommended debug steps from cert manager I tracked the issue to the challenges whereby I get:

Status:
  Presented:   true
  Processing:  true
  Reason:      Waiting for HTTP-01 challenge propagation: wrong status code '401', expected '200'
  State:       pending
Events:        <none>

I'm kinda stuck I've been googling my heart out but there doesn't seem to be a lot on this. I'm guessing I've stuffed up on the set up but I've mainly been following the documentation on the relevant pages. Any pointers would be greatly appreciated :). If you need any additional info let me know this is currently quite long so I tried to include what I thought were problem points.

Llewyn S
  • 41
  • 1
  • 6
  • if i read your setup correctly you get a 401 on the folder/file the cert-manager (certbot?) requests. This seems to be because the call from letsencrypt transfers to something you might have password protected Your nginx logs should show you a request to some folder name .well-known or similar. After that is a generic name. That one needs to be accessable by an outsider (certbot in this example) For me it worked best to make an exclusing of that specific directory to be directly served by nginx. Something like a location .well-known block in the nginx config. – Dennis Nolte Sep 07 '21 at 08:31
  • The idea of @DennisNolte is good. Could you try it and know us if it works? – Mikołaj Głodziak Sep 07 '21 at 08:49
  • @DennisNolte Thanks for your reply, I couldn't find any attempts to access the/.well-known/ directory in the Nginx controller log but I could find a reference to it and my domain in the nginx.conf Are you suggesting that I change the location /.well-known/ to another directory in the nginx.conf? – Llewyn S Sep 07 '21 at 12:32
  • Did you see [this topic](https://github.com/jetstack/cert-manager/issues/2517)? Is it similar to your problem? – Mikołaj Głodziak Sep 08 '21 at 09:10
  • @MikołajGłodziak I had a look through it but couldn't find anything that applied. Seems like most people don't get 401s... I have no idea how to debug as there's no permissions issues in the ingress log. Just the certificate challenge gets the 401 error. I'm wondering if I can somehow add cert-manager to an RBAC group or something... – Llewyn S Sep 08 '21 at 12:52
  • you will need to try and figure out what the cert challenge is actually accessing your server with, this should be in the ingres logs, The file isusually called something like access.log, possibly with a domain name in front. Inthat file you will find the HTTP call nginx is recieving. If you dont find any entries there do some manually to confirm its the right logfile. The same is true for your backend, but this might have to be configured first. Once you know the correct call you can verify that you allow acces to that directory and determine who should serve that directory – Dennis Nolte Sep 08 '21 at 13:59
  • Hello @LlewynS. Any updates? – Wytrzymały Wiktor Sep 14 '21 at 12:00
  • Not yet I couldn't track down any helpful logs. – Llewyn S Sep 19 '21 at 14:24
  • OK I set up a minimal example with no cert manager. I found that if I was trying to connect to my FQDN via my home network it was actually taking me to the router login... I am able to connect via the FQDN now via VPN or mobile phone. This doesn't really resolve cert-manager having 401 issues other than if the routing via the home network is also sending it to the router login rather than ingress.... – Llewyn S Sep 30 '21 at 01:20

2 Answers2

1

In my case clusterissuer was pointing to wrong ingress class

kubectl edit clusterissuer XXXX

solvers:
- http01:
    ingress:
      class: nginternal

Make sure class is pointing to same as ingress.

user954014
  • 11
  • 1
0

The solution to this question was that the router I had was not capable of performing NAT loop back.

Getting a router that had this functionality resolved my issue. Hopefully that helps anyone having issues like this.

Llewyn S
  • 41
  • 1
  • 6