2

I'm following the official AWS EKS tutorial on setting up a distributed GPU cluster for Tensorflow model training and am hitting a bit of a snag.

After creating a new cluster using eksctl and verifying that the corresponding ~/.kube/config file exists on my gateway node, the tutorial instructs that I download ksonnet on the gateway node and use it to initialize a new application:

$ ks init <app-name>

When I try running this, however, I receive the following error:

INFO Using context "arn:aws:eks:us-west-2:131397771409:cluster/<cluster name>" from kubeconfig file "/home/ubuntu/.kube/config"
INFO Creating environment "default" with namespace "default", pointing to "version:v1.18.9" cluster at address <cluster address>
ERROR No Major.Minor.Patch elements found

I've done some searching around on Github/SO, but have not been able to find a resolution to this issue. I suspect the true answer is to move away from using ksonnet, as it is no longer being maintained (and hasn't been for the last 2 years it appears), but for the time being I'd just like to be able to complete the tutorial :)

Any insight is appreciated!

Contents of my ~/.kube/config:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: <certificate>
    server: <server>
  name: arn:aws:eks:us-west-2:131397771409:cluster/<name>
contexts:
- context:
    cluster: arn:aws:eks:us-west-2:131397771409:cluster/<name>
    user: arn:aws:eks:us-west-2:131397771409:cluster/<name>
  name: arn:aws:eks:us-west-2:131397771409:cluster/<name>
current-context: arn:aws:eks:us-west-2:131397771409:cluster/<name>
kind: Config
preferences: {}
users:
- name: arn:aws:eks:us-west-2:131397771409:cluster/<name>
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      args:
      - --region
      - us-west-2
      - eks
      - get-token
      - --cluster-name
      - <name>
      command: aws

1 Answers1

1

On the init, you can override the api spec version (that worked for me on that particular step although I got into other issues later on):

ks init ${APP_NAME} --api-spec=version:v1.7.0

Reference

In the end, I made it work with ks init ${APP_NAME} (without --api-spec) in GCP using ksonnet v0.13.1 on old kubeflow (v0.2.0-rc.1) and GKE cluster (1.14.10) versions.
BTW, I was in "Kubeflow: End to End" qwiklab from this course.

Jones
  • 92
  • 9