I am using the .yaml file below to create Katib Experiment in Kubeflow. However, I am getting
Failed to reconcile: cannot restore struct from: string
errors. Have any solutions for this? Most of the Katib Experiment example codes doesn't have a volume in it, but I'm trying to mount a volume after downloading the data from my S3.
apiVersion: "kubeflow.org/v1alpha3"
kind: Experiment
metadata:
namespace: apple
labels:
controller-tools.k8s.io: "1.0"
name: transformer-experiment
spec:
objective:
type: maximize
goal: 0.8
objectiveMetricName: Train-accuracy
additionalMetricNames:
- Train-loss
algorithm:
algorithmName: random
parallelTrialCount: 3
maxTrialCount: 12
maxFailedTrialCount: 3
metricsCollectorSpec:
collector:
kind: StdOut
parameters:
- name: --lr
parameterType: double
feasibleSpace:
min: "0.01"
max: "0.03"
- name: --dropout_rate
parameterType: double
feasibleSpace:
min: "0.005"
max: "0.020"
- name: --layer_count
parameterType: int
feasibleSpace:
min: "2"
max: "5"
- name: --d_model_count
parameterType: categorical
feasibleSpace:
list:
- "64"
- "128"
- "256"
trialTemplate:
goTemplate:
rawTemplate: |-
apiVersion: batch/v1
kind: Job
metadata:
name: {{.Trial}}
namespace: {{.NameSpace}}
spec:
template:
spec:
volumes:
- name: train-data
emptyDir: {}
containers:
- name: data-download
image: amazon/aws-cli
command:
- "aws s3 sync s3://kubeflow/kubeflowdata.tar.gz /train-data"
volumeMounts:
- name: train-data
mountPath: /train-data
- name: {{.Trial}}
image: <Our Image>
command:
- "cd /train-data"
- "ls"
- "python"
- "/opt/ml/src/main.py"
- "--train_batch=64"
- "--test_batch=64"
- "--num_workers=4"
volumeMounts:
- name: train-data
mountPath: /train-data
{{- with .HyperParameters}}
{{- range .}}
- "{{.Name}}={{.Value}}"
{{- end}}
{{- end}}
restartPolicy: Never