Questions tagged [flops]

FLOPS (FLoating point Operations Per Second): a unit of measurement used to quantify the performance of the implementation of a numerical algorithm.

Anything related to the FLOPS unit of measurement (FLoating point Operations Per Second), i.e. a unit of measurement used to quantify the performance of the implementation of a numerical algorithm.

See Wikipedia page on FLOPS.

132 questions
0
votes
1 answer

Python code to benchmark in flops using threading

I'm having trouble writing a benchmark code in python using threading. I was able to get my threading to work, but I can't get my object to return a value. I want to take the values and add them to a list so I can calculate the flops. create class…
Missy
  • 11
  • 1
  • 4
0
votes
1 answer

Timing Experiment - Matrices

Determine a matrix size that you can comfortably fit into your available RAM. For example, if you have a 4 GB machine, you should be able to comfortably store a matrix that occupies about 800MB. Store this value in a variable Mb. Use the…
Ryan
  • 17
  • 2
0
votes
0 answers

Better pipeline utilization causes lower performance in GPUs

I'm developing a simple OpenCL kernel, which is only doing computation with no memory access at all. Here is the kind of kernel we are executing on the GPU: __kernel void WGS512MAPI8LLXOPS64(const __global float *GIn, __global float *GOut, const int…
saman
  • 199
  • 4
  • 17
0
votes
2 answers

Performance gap between two almost the same OpenCL kernels

I have two almost the same OpenCL kernels which I want to calculate their performance in GFLOPS. Kernel #1 is: __kernel void Test41(__global float *data, __global float *rands, int index, int rand_max){ float16 temp; int gid =…
saman
  • 199
  • 4
  • 17
0
votes
1 answer

Calculating GPU's maximum flops using OpenCL

I am writing a simple OpenCL application, which is going to calculate the maximum experiment FLOPS of a target GPU device. I have decided to keep my cl kernel as simple as possible. Here are my OpenCL kernel and my host code. Kernel code…
saman
  • 199
  • 4
  • 17
0
votes
1 answer

CuSparse/CuBlas K40 vs GTX Titan X (Maxwell)

I am using both Tesla k40 and GTX Titan X and I have Cuda 8.0 The functions that I use are CuBlas and CuSparse library functions: cusparseDcsrsv2_solve(); cusparseDcsrmv(); cublasDdot(); Why GTX Titan X is faster than K40? I am compiling nvcc with…
0
votes
0 answers

How to calculate flops per second

As a previous post says and also wiki, "ivy bridge can do "8 DP FLOPs/cycle: 4-wide AVX addition + 4-wide AVX multiplication" I'm a bit confused here, I know ivy bridge doesn't have FMA, and AVX instruction set can do 4 DP/cycle, so why 4 addition +…
0
votes
1 answer

How to calculate the total number of FOP and floating-point performance of special operations(exp sin sqrt)?

When measuring an algorithm, if there are division operations, how to calculate the total number of FOP and floating-point performance? For example, n2 matrix multiplication, the calculation of n3 * 2flops (a multiplication, an addition), assuming…
0
votes
0 answers

What is the FLOPs performance of MIPS64 architecture CPUs

I've been digging for quite some time and always hit a brick wall, when I try to estimate the FLOPs of a MIPS64 CPU series, that I'm evaluating for an embedded design. Moreover I can't seem to find how many floating point operations this CPU can do…
vlex
  • 131
  • 12
0
votes
1 answer

How to create makefile CUDA so it executed in CPU to test CPU FLOPs?

I'm trying to count the GPU and CPU FLOPs and I've got the source from here I renamed it to cudaflops.cu and compiled it with this makefile ################################################################################ # # Build script for…
Arief Goldalworming
  • 291
  • 1
  • 4
  • 12
0
votes
1 answer

How many FLOPs are there in calculating a factorial using math.factorial(n) in python

I am trying to understand how many FLOPs are there if I use a certain algorithm to find the exponential approximated sum, specially If I use math.factorial(n) in python. I understand FLOPs for binary operation, so is factorial also a binary…
bhjghjh
  • 889
  • 3
  • 16
  • 42
0
votes
0 answers

Computing capability of each core

I am looking for a benchmark which measures the computing capability of each core in my system (a supercomputer). In the other word, I want to find the realistically achievable maximum floating-point operations per second in one processor. I found a…
Matrix
  • 2,399
  • 5
  • 28
  • 53
0
votes
1 answer

Power consumption estimation from number of FLOPS (floating point operations)?

I have extracted how many flops (floating point operations) each of my algorithms are consuming, I wonder if I implement this algorithms on FPGA or on a CPU, can predict (roughly at least) how much power is going to be consumed? Both power…
Mehdi
  • 293
  • 4
  • 13
0
votes
0 answers

Calculating actual flop/core when using actual memory bandwidth

I want to calculate the actual amount of mflop/s/core using the following information: I have measured actual amount of memory bandwidth of each core in 1 node which is 4371 MB/s. I have also measured mflop/s/core on one node if I use only one…
Matrix
  • 2,399
  • 5
  • 28
  • 53
0
votes
1 answer

Calculating mflop/s of a HPC application using memory bandwidth info

I want to calculate mflops (million of operations per second per processor) of a HPC application(NAS benchmark) without running the application. I have measured the memory bandwidth of each core of my system (a supercomputer) using Stream Benchmark.…
Matrix
  • 2,399
  • 5
  • 28
  • 53
1 2 3
8
9