This refers to one application's using multiple graphics-processing units, either in traditional (graphical) or general-purpose (GPGPU) applications.
Questions tagged [multi-gpu]
387 questions
0
votes
1 answer
Multi-GPU training using tf.slim takes more time than single GPU
I'm fine-tuning ResNet50 on the CIFAR10 dataset using tf.slim's train_image_classifier.py script:
python train_image_classifier.py \
--train_dir=${TRAIN_DIR}/all \
…

Anas
- 866
- 1
- 13
- 23
0
votes
1 answer
Default Tensorflow device on multi-GPUs
If I run a tensorflow model (e.g. cifar10) with one GPU on a multi-gpu platform, tensorflow creates and broadcasts (training/inference) data across all the GPUs available. Since I set num_gpus to 1, it's running on only one GPU. However, I can see…

joshsuihn
- 770
- 1
- 10
- 25
0
votes
0 answers
Shouldn't Caffe run more quickly on 2 GPU's than on 1?
We've just gotten a multi-gpu machine at work, and I'm trying to verify that 2 GPU's on Caffe are better than 1. To do this, I'm using the quick train example of the CIFAR-10 dataset. So, far, I'm finding that 2GPU's slows things down and I don't…

user1245262
- 6,968
- 8
- 50
- 77
0
votes
2 answers
TensorFlow for MultiGPU
If someone can help me understand the situation it would be great. Thanks in advance.
My setup:
OS: Ubuntu 16.04, 2 Titan X GPUs. TensorFlow (version 0.12.1) installed in a conda environment using pip as on TF docs. Python 3.5.
Code:
I ran the…

Prabu
- 11
- 3
0
votes
1 answer
Is cudaEventRecord affected by the identity of the current device?
cudaEventRecord takes an event ID and a stream ID as parameters. The Runtime API reference does not say whether the stream is required to be associated with the current device - and I can't test whether that's the case since I only have one GPU at…

einpoklum
- 118,144
- 57
- 340
- 684
0
votes
1 answer
Tensorflow : Pinning Variables to CPU in Multigpu training not working
I am training my first multi-gpu model using tensorflow.
As the tutorial states the variables are pinned onto the CPU and ops on every GPU using name_scope.
As i am running a small test and logging the device placement, i can see the ops being…

Ashish Kumar
- 23
- 6
0
votes
1 answer
How to run OpenCL on multiple GPUs (2) simultaneously?
I have two GPUs, one kernel, a single context and two command queues (1 per each GPU). I have tried to run them in a loop where each command queue is run and then I have tried both queue.finish() and queue.flush() in hope of running the work on the…

Mohammad Sohaib
- 577
- 3
- 11
- 28
0
votes
1 answer
Tensorflow with Buckets Error
I'm trying to train a sequence to sequence model using tensorflow. I see that in the tutorials, buckets help speed up training. So far I'm able to train using just one bucket, and also using just one gpu and multiple buckets using more or less out…

jon
- 100
- 1
- 8
0
votes
1 answer
tensorflow distributed training hybrid with multi-GPU methodology
After playing with the current distributed training implementation for a while, I think it views each GPU as a separate worker.However, It is common now to have 2~4 GPUs in one box. Isn't it better to adopt the single box multi-GPU methodology to…

user3742402
- 1
- 1
0
votes
1 answer
Matlab Dual GPU memory usage
I have a Dual GPU card named Titan Z. I have Matlab 2016a trying to solve a sparse Ax=b equation set for different 'b' values. Titan Z has two GPUs and 6 GB ram for each gpu
Here is the problem.
If I solve a Ax=b problem on 1 GPU, let's say a 'A'…

coercion
- 53
- 7
0
votes
1 answer
CUDA-aware MPI for two GPUs within one K80
I am trying to optimize the performance of a MPI+CUDA benchmark called LAMMPS (https://github.com/lammps/lammps). Right now I am running with two MPI processes and two GPUs. My system has two sockets and each socket connects to 2 K80. Since each K80…

silence_lamb
- 377
- 1
- 3
- 12
0
votes
2 answers
Implicit working of Multi GPU
In OpenCL is it possible that a system consisting of multiple GPU's implicitly divide the job without programmer explicitly dividing the work load?
For eg say I have a GPU consisting of 1 SM 192 core GPU and run a matrix multiplication , which works…

pradyot
- 174
- 12
0
votes
1 answer
Parallel Matrix Multiplication using multi GPU
I have installed two GPUs (2x Nvidia Quadro 410) in my system in different pci slots. To solve Martix multiplication on both of these GPU, how can I split the input matrices such that each GPU processes/computes a part of output matrix and then…

pradyot
- 174
- 12
0
votes
4 answers
CUDA: Memory copy to GPU 1 is slower in multi-GPU
My company has a setup of two GTX 295, so a total of 4 GPUs in a server, and we have several servers.
We GPU 1 specifically was slow, in comparison to GPU 0, 2 and 3 so I wrote a little speed test to help find the cause of the problem.
//#include…

zenna
- 9,006
- 12
- 73
- 101
0
votes
1 answer
glCreateSyncFromCLeventARB alternative?
I would like to save a call to clFinish() in OpenCL before using cl_command_queue result in OpenGL (I have a shared image/texture used in OpenCL/GL).
I found in the book "OpenCL Programming by Example" (p. 243) that creating a GLsync from an OpenCL…

Yoav
- 5,962
- 5
- 39
- 61