XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear algebra that optimizes TensorFlow computations. The results are improvements in speed, memory usage, and portability on server and mobile platforms. Initially, most users will not see large benefits from XLA, but are welcome to experiment by using XLA via just-in-time (JIT) compilation or ahead-of-time (AOT) compilation.
Questions tagged [tensorflow-xla]
83 questions
15
votes
1 answer
First tf.session.run() performs dramatically different from later runs. Why?
Here's an example to clarify what I mean:
First session.run():
First run of a TensorFlow session
Later session.run():
Later runs of a TensorFlow session
I understand TensorFlow is doing some initialization here, but I'd like to know where in the…

Armando Montanez
- 213
- 3
- 8
12
votes
2 answers
Tensorflow: device CUDA:0 not supported by XLA service while setting up XLA_GPU_JIT device number 0
I got this when using keras with Tensorflow backend:
tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service
while setting up XLA_GPU_JIT device number 0
Relevant code:
tfconfig =…

monotasker
- 510
- 2
- 4
- 16
7
votes
1 answer
What is XlaBuilder for?
What's the XLA class XlaBuilder for? The docs describe its interface but don't provide a motivation.
The presentation in the docs, and indeed the comment above XlaBuilder in the source code
// A convenient interface for building up…

joel
- 6,359
- 2
- 30
- 55
6
votes
0 answers
The performance of operation fusion using TensorFlow XLA-JIT on CPU backend
Can anyone give me any hints why XLA-JIT has better performance on CPU backend?
I tried TensorFlow without and with XLA-JIT (manual mode) on mnist benchmark on a single CPU. Using XLA-JIT achieves 13.6x speedups against TensorFlow without…

MarZzz
- 329
- 2
- 12
5
votes
2 answers
XLA can't deduce compile time constant output shape for strided slice when using ragged tensor and while loop
Is it possible to get the following minimal example working with experimental_compile=True? I've seen some big speedups with this argument hence I am keen to figure out how to get it working. Thanks!
import tensorflow as tf
print(tf.__version__)
#…

Jeff
- 718
- 8
- 20
5
votes
0 answers
Getting Tensorflow to automatically detect and use an XLA GPU
I have an XLA GPU that is not automatically detected by tensorflow, yet I am able to perform computations on it. My desired outcome: the result of print(tf.test.is_gpu_available()) is "True".
Here is the code I am running:
#!/usr/bin/python3
import…

EnnFour
- 323
- 2
- 8
5
votes
0 answers
Using Tensorflow with RNNs & batch normalisation
I've been tracking down a SEGFAULT in Tensorflow. The issue can be reproduced with the following snippet:
import tensorflow as tf …

Thomas Bastiani
- 111
- 1
- 6
5
votes
0 answers
Is there any way to fuse fully connected layer(gemm) and activation layer(relu/sigmoid) on gpu in dnn?
Usually one layer in dnn consists of MatMul, BiasAdd, Relu, cuBlas provides Gemm for MatMul, and we can do BiasAdd and Relu in another kernel for GPU. They are two GPU lanuch calls, is there any way to fuse them all togather and make them just one?…

Kan Liu
- 175
- 1
- 8
5
votes
1 answer
how tensorflow inference in fp16 with model trained in fp32
Is there any seamless way available with best fp16 performance being achieved in NV V100/P100?
E.g. I've a model and implementation being trained in fp32. The App works perfectly. Now, I'd like to explore the experience of fp16. Is there any simple…

xiaoyong
- 61
- 5
4
votes
1 answer
What is the difference between Tensorflow XLA and Tensorflow Lite / Android NNAPI?
Tensorflow came out with the XLA compiler which compiles the backend C++ tensorflow targeting LLVM. My understanding about XLA was that it was a step towards supporting generic accelerated devices, so long as there was LLVM -> Device…

David Parks
- 30,789
- 47
- 185
- 328
4
votes
0 answers
XLA support for custom kernel implementation on Raspberry Pi GPU
I am trying to implement Tensorflow OpKernels on Raspberry Pi3 GPU (QPU) for operations like Conv2D,Pooling,ReLU etc.
The operations are mainly targeted to improve performance during inference and do not care about training (hence back propagation…

gowtam
- 51
- 2
4
votes
1 answer
indexing in tensorflow slower than gather
I am trying to index into a tensor to get a slice or single element from 1d tensors. I find that there is significant performance difference when using the numpy way of indexing [:] and slice vs tf.gather (almost 30-40% ).
Also I observe that…

user179156
- 841
- 9
- 31
3
votes
1 answer
Loading a model inside while loop of Tensorflow
I am trying to use @tf.function(jit_compile=True) to create a TF graph with a while loop; below is its pseudocode. I'm not able to provide a functioning code since it contains a lot of dependencies.
Code 1
@tf.function(jit_compile=True)
def…

newbie
- 443
- 1
- 4
- 14
3
votes
2 answers
Running TensorFlow with XLA tf.function throws error
When I'm trying to compile this code, getting the below error.
File "xla_test.py", line 25, in
@tf.function(jit_compile=True)
TypeError: function() got an unexpected keyword argument 'jit_compile'

harry
- 970
- 6
- 25
3
votes
1 answer
Tensorflow Serving with XLA
Is it possible to enable XLA compilation when doing inference with Tensorflow Serving?
(I am hoping it's just a matter of undocumented configs and that I can avoid implementing a custom Servable).

njs
- 31
- 2