Questions tagged [openpai]
9 questions
1
vote
1 answer
Single V1.0.0 devbox for multiple deployment
For deployment before V1.0.0, the Admin. can start and keep one dev-box container for each single OpenPAI cluster. Therefore the Admin. can have multiple dev-box containers on single host/VM for different OpenPAI cluster deployment and…

Joseph
- 35
- 3
1
vote
0 answers
Is there an auto multi-node distributed processing function for multi-GPU parallel processing code with only CUDA library (without OpenMPI)?
I am building and testing OpenPAI v0.14.0.
Previously, I have built OpenPAI on a 1-node 4-gpu machine, and have used it for 4-gpu distributed-parallel processing.
This time, a new 1-node 2-gpu machine came in and connected the two nodes.
OpenPAI…

jsh-fw
- 11
- 1
0
votes
1 answer
i have made a pvc,why my openpai dashboard can't see any storages?
here are my pvc.yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: csi-s3-pvc
namespace: pai-storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: csi-s3
and here is my…

gaoyangcaiji
- 1
- 1
0
votes
1 answer
openpai k8s cluster deploying rest-server failed
failed to deploy rest-server latest on k8s cluster
/usr/src/app/src/config/launcher.js: 144
throw new Error('cannot connect to framework launcher');
npm ERR! code ELIFECYCLE
npm ERR! errno 1
python paictl.py service start -n rest-server
I…

victorming888
- 121
- 2
- 5
0
votes
1 answer
Why don't setup kube-dns in PAI k8s cluster?
I tried to deploy kubeflow in the clusters and found there was no dns service available. I am not sure the reason, could someone explain it ?

Dongqing
- 674
- 1
- 5
- 19
0
votes
1 answer
The difference and relationship between concept Job and Framework
I don't find a logic model document for the concepts of OpenPai, after reading some codes, I think the Job is as same as the framework? The job is the user faced concept while the framework is the internal name. Am I correct?

Dongqing
- 674
- 1
- 5
- 19
0
votes
1 answer
Using openpai, whether the task can work on more than one worker?
For example, I have built a pai cluster with 2 workers, and each worker has 2 GPUs. If I want to use four GPUs to run a task, can this cluster meet the demand and use both worker to run the task?

邓泽帅
- 1
0
votes
2 answers
PAI tutorial example failed to run. With '[ExitCode]: 177'
I was following the PAI job tutorial.
Here's my job's config:
{
"jobName": "yuan_tensorflow-distributed-jobguid",
"image": "docker.io/openpai/pai.run.tensorflow",
"dataDir": "hdfs://10.11.3.2:9000/yuan/sample/tensorflow",
"outputDir":…

hao
- 1,144
- 12
- 15
-1
votes
1 answer
Could I use selfhost AD with OpenPAI?
For some reasons, our servers cannot connect to the Internet so we have a selfhost AD server to manage users. I wonder if I could use it because the docs only say they support AAD :( Thanks!

Panda
- 1
- 1