Questions tagged [openpai]

9 questions
1
vote
1 answer

Single V1.0.0 devbox for multiple deployment

For deployment before V1.0.0, the Admin. can start and keep one dev-box container for each single OpenPAI cluster. Therefore the Admin. can have multiple dev-box containers on single host/VM for different OpenPAI cluster deployment and…
Joseph
  • 35
  • 3
1
vote
0 answers

Is there an auto multi-node distributed processing function for multi-GPU parallel processing code with only CUDA library (without OpenMPI)?

I am building and testing OpenPAI v0.14.0. Previously, I have built OpenPAI on a 1-node 4-gpu machine, and have used it for 4-gpu distributed-parallel processing. This time, a new 1-node 2-gpu machine came in and connected the two nodes. OpenPAI…
jsh-fw
  • 11
  • 1
0
votes
1 answer

i have made a pvc,why my openpai dashboard can't see any storages?

here are my pvc.yaml: apiVersion: v1 kind: PersistentVolumeClaim metadata: name: csi-s3-pvc namespace: pai-storage spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi storageClassName: csi-s3 and here is my…
0
votes
1 answer

openpai k8s cluster deploying rest-server failed

failed to deploy rest-server latest on k8s cluster /usr/src/app/src/config/launcher.js: 144 throw new Error('cannot connect to framework launcher'); npm ERR! code ELIFECYCLE npm ERR! errno 1 python paictl.py service start -n rest-server I…
victorming888
  • 121
  • 2
  • 5
0
votes
1 answer

Why don't setup kube-dns in PAI k8s cluster?

I tried to deploy kubeflow in the clusters and found there was no dns service available. I am not sure the reason, could someone explain it ?
Dongqing
  • 674
  • 1
  • 5
  • 19
0
votes
1 answer

The difference and relationship between concept Job and Framework

I don't find a logic model document for the concepts of OpenPai, after reading some codes, I think the Job is as same as the framework? The job is the user faced concept while the framework is the internal name. Am I correct?
Dongqing
  • 674
  • 1
  • 5
  • 19
0
votes
1 answer

Using openpai, whether the task can work on more than one worker?

For example, I have built a pai cluster with 2 workers, and each worker has 2 GPUs. If I want to use four GPUs to run a task, can this cluster meet the demand and use both worker to run the task?
0
votes
2 answers

PAI tutorial example failed to run. With '[ExitCode]: 177'

I was following the PAI job tutorial. Here's my job's config: { "jobName": "yuan_tensorflow-distributed-jobguid", "image": "docker.io/openpai/pai.run.tensorflow", "dataDir": "hdfs://10.11.3.2:9000/yuan/sample/tensorflow", "outputDir":…
hao
  • 1,144
  • 12
  • 15
-1
votes
1 answer

Could I use selfhost AD with OpenPAI?

For some reasons, our servers cannot connect to the Internet so we have a selfhost AD server to manage users. I wonder if I could use it because the docs only say they support AAD :( Thanks!
Panda
  • 1
  • 1