2

I'm currently in the process of trying to deploy a mainnet archive node with an erigon docker image to a GKE cluster (thorax/erigon). I have successfully been able to deploy a Geth node with a similar configuration as below, but when trying to use the same methodology for erigon I have not been successful.

Below is my YAML deployment file:

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: erigon-mainnet
  namespace: erigon-mainnet
spec:
  selector:
    matchLabels:
      app: erigon-mainnet
  replicas: 2
  serviceName: erigon-mainnet
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: erigon-mainnet
    spec:
      terminationGracePeriodSeconds: 300
      containers:
        - name: erigon
          image: docker.io/thorax/erigon
          ports:
            - containerPort: 8545
            - containerPort: 8546
            - { containerPort: 30303, protocol: TCP }
            - { containerPort: 30303, protocol: UDP }
          args:
            [
              "--datadir=/mainnet",
              "--chain=mainnet",
              "--http",
              "--http.addr=0.0.0.0",
              "--http.api=eth,net,web3",
              "--http.vhosts=*",
             " --http.corsdomain=*",
              "--ws",
              "--ws.addr=0.0.0.0",
              "--ws.api=eth,net,web3",
              "--ws.origins=*",
            ]
          resources:
            requests:
              memory: 2G
              cpu: 1000m
            limits:
              memory: 16G
              cpu: 8000m
          livenessProbe:
            initialDelaySeconds: 10
            timeoutSeconds: 10
            httpGet:
              path: /
              port: 8545
          readinessProbe:
            httpGet:
              path: /
              port: 8545
          volumeMounts:
            - name: mainnet
              mountPath: /mainnet
      nodeSelector:
        chain: mainnet
  volumeClaimTemplates:
    - metadata:
        name: "mainnet"
      spec:
        storageClassName: premium-rwo
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 4Ti

---
apiVersion: v1
kind: Service
metadata:
  name: erigon-mainnet
  namespace: erigon-mainnet
spec:
  ports:
    - protocol: TCP
      targetPort: 8545
      port: 8545
      name: http
    - protocol: TCP
      targetPort: 8546
      port: 8546
      name: websoket
  clusterIP: None
  selector:
    app: erigon-mainnet

The result from kubectl describe pod yields:

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  53s                default-scheduler  Successfully assigned erigon-mainnet/erigon-mainnet-0 to gke-node-cluster-polygon-a017195b-fwhs
  Normal   Pulled     49s                kubelet            Successfully pulled image "docker.io/thorax/erigon" in 430.462783ms
  Normal   Pulled     48s                kubelet            Successfully pulled image "docker.io/thorax/erigon" in 399.71813ms
  Normal   Pulling    30s (x3 over 50s)  kubelet            Pulling image "docker.io/thorax/erigon"
  Normal   Created    29s (x3 over 49s)  kubelet            Created container erigon-mainnet
  Warning  Failed     29s (x3 over 49s)  kubelet            Error: failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "--datadir=/mainnet": stat --datadir=/mainnet: no such file or directory: unknown
  Normal   Pulled     29s                kubelet            Successfully pulled image "docker.io/thorax/erigon" in 417.260296ms
  Warning  BackOff    10s (x8 over 48s)  kubelet            Back-off restarting failed container

So, my assumption here is that I am probably mounting the SSD to the wrong directory. I have tried leaving the --datadir flag blank and mounting it to the default datadir erigon directory, but I still run into crash loops. With my Geth node, I mounted to /chaindata exactly the same logic as above and the node ran fine. If anyone knows what the problem here could be, any help is appreciated. I am fairly new to GKE, and erigon so it might be a simple resolution I'm overlooking.

Martin Zeitler
  • 1
  • 19
  • 155
  • 216
0xOsiris
  • 247
  • 2
  • 17

1 Answers1

2

It fails with:

 exec: "--datadir=/mainnet"
     : stat --datadir=/mainnet: no such file or directory

And the erigon documentation reads:

Use --datadir to choose where to store data.

So you can use what ever you want, but the directory/volume has to exist, so that the command stat /mainnet wouldn't fail. I'd assume that you haven't created a gcepersistentdisk to mount:
https://cloud.google.com/sdk/gcloud/reference/compute/disks/create

gcloud compute disks create --size=1TB --zone=us-central1-a mainnet-data

Then declare as PD, persistent disk:

gcePersistentDisk:
  pdName: mainnet-data
  fsType: ext4

Config volumeClaimTemplates means kind: StatefulSet; use kind: Deployment. This example also uses Deployment for everything; it may download the whole blockchain into the --datadir.
And when you look at the Ethereum Chain Full Sync Data Size, this means --size=1TB or 2TB:

759.03 GB for Jun 18 2022


In comparison, expedition only uses an Ethereum EPC endpoint.

Martin Zeitler
  • 1
  • 19
  • 155
  • 216