Highest Voted 'memory-bandwidth' Questions

134

votes

13 answers

Any optimization for random access on a very big array when the value in 95% of cases is either 0 or 1?

Is there any possible optimization for random access on a very big array (I currently use uint8_t, and I'm asking about what's better) uint8_t MyArray[10000000]; when the value at any position in the array is 0 or 1 for 95% of all cases, 2 in 4%…

asked May 14 '18 at 05:23

JohnAl

1,064
2
10
18

86

votes

1 answer

memory bandwidth for many channels x86 systems

I'm testing the memory bandwidth on a desktop and a server. Sklyake desktop 4 cores/8 hardware threads Skylake server Xeon 8168 dual socket 48 cores (24 per socket) / 96 hardware threads The peak bandwidth of the system is Peak bandwidth desktop =…

c x86 openmp avx512 memory-bandwidth

asked Jun 28 '19 at 09:05

Z boson

32,619
11
123
226

52

votes

8 answers

How to increase performance of memcpy

Summary: memcpy seems unable to transfer over 2GB/sec on my system in a real or test application. What can I do to get faster memory-to-memory copies? Full details: As part of a data capture application (using some specialized hardware), I need to…

c visual-studio memcpy cvi memory-bandwidth

asked Nov 23 '10 at 20:33

leecbaker

3,611
2
35
51

40

votes

4 answers

Why vectorizing the loop over 64-bit elements does not have performance improvement over large buffers?

I am investigating the effect of vectorization on the performance of the program. In this regard, I have written following code: #include #include #include #define LEN 10000000 int main(){ struct timeval…

c performance simd icc memory-bandwidth

asked Aug 10 '13 at 06:55

Pouya

1,871
3
20
25

14

votes

5 answers

Can the Intel performance monitor counters be used to measure memory bandwidth?

Can the Intel PMU be used to measure per-core read/write memory bandwidth usage? Here "memory" means to DRAM (i.e., not hitting in any cache level).

performance x86 intel-pmu memory-bandwidth

asked Dec 02 '17 at 21:37

BeeOnRope

60,350
16
207
386

13

votes

3 answers

What specifically marks an x86 cache line as dirty - any write, or is an explicit change required?

This question is specifically aimed at modern x86-64 cache coherent architectures - I appreciate the answer can be different on other CPUs. If I write to memory, the MESI protocol requires that the cache line is first read into cache, then modified…

x86 x86-64 cpu-architecture cpu-cache memory-bandwidth

asked Nov 21 '17 at 16:04

Tim

916
7
21

10

votes

1 answer

MOVSD performance depends on arguments

I just noticed a pieces of my code exhibit different performance when copying memory. A test showed that a memory copying performance degraded if the address of destination buffer is greater than address of source. Sounds ridiculous, but the…

performance delphi assembly x86 memory-bandwidth

asked Jul 21 '19 at 21:18

user4859735

103
6

10

votes

3 answers

How to get memory bandwidth from memory clock/memory speed

FYI, Here are the specs I got from Nvidia http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-680/specifications http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan/specifications Note that the memory speed/memory clock are the same…

gpu memory-bandwidth

asked Feb 24 '13 at 19:49

Blue_Black

307
1
3
11

9

votes

3 answers

what does STREAM memory bandwidth benchmark really measure?

I have a few questions on STREAM (http://www.cs.virginia.edu/stream/ref.html#runrules) benchmark. Below is the comment from stream.c. What is the rationale about the requirement that arrays should be 4 times the size of cache? * (a) Each…

benchmarking cpu-architecture microbenchmark memory-bandwidth

asked May 11 '19 at 03:44

yeeha

139
2
8

9

votes

1 answer

Roofline model: calculating operational intensity

Say I have a toy loop like this float x[N]; float y[N]; for (int i = 1; i < N-1; i++) y[i] = a*(x[i-1] - x[i] + x[i+1]) And I assume my cache line is 64 Byte (i.e. big enough). Then I will have (per frame) basically 2 accesses to the RAM and 3…

c++ performance memory-bandwidth roofline

asked Nov 22 '16 at 22:54

Armen Avetisyan

1,140
10
29

9

votes

2 answers

Why is memset slow?

The spec for my CPU says it should get 5.336GB/s bandwidth to memory. To test this, I wrote a simple program that runs memset (or memcpy) on a big array and reports the timing. I'm showing 3.8GB/s on memset and 1.9GB/s on memcpy. …

optimization memcpy memset memory-bandwidth

asked Apr 29 '14 at 20:07

Jeff Guy

157
1
9

7

votes

2 answers

OpenMP and cores/threads

My CPU is a Core i3 330M with 2 cores and 4 threads. When I execute the command cat /proc/cpuinfo in my terminal, it is like I have 4 CPUS. When I use the OpenMP function get_omp_num_procs() I also get 4. Now I have a standard C++ vector class, I…

c++ parallel-processing cpu openmp memory-bandwidth

asked Feb 15 '12 at 11:07

Benjamin

366
1
3
8

6

votes

5 answers

Efficient memory bandwidth use for streaming

I have an application that streams through 250 MB of data, applying a simple and fast neural-net threshold function to the data chunks (which are just 2 32-bit words each). Based on the result of the (very simple) compute, the chunk is unpredictably…

optimization streaming cpu-cache memory-bandwidth

asked Apr 02 '09 at 11:17

SPWorley

11,550
9
43
63

5

votes

1 answer

Erroneous single thread memory bandwidth benchmark

In an attempt to measure the bandwidth of the main memory, I have come up with the following approach. Code (for the Intel compiler) #include #include // std::cout #include // std::numeric_limits #include //…

c++ assembly performance-testing benchmarking memory-bandwidth

asked Mar 10 '22 at 16:42

Nitin Malapally

534
2
10

5

votes

0 answers

Can x86's lock prefix on uncacheable memory cause a Denial of Service on memory bandwidth?

Can an instruction with lock prefix starve rest of the CPUs (virtual machines) for memory bandwidth in a virtualized environment ? For example, consider the following piece of code loop: lock inc dword [rax] jmp loop Now assume that rax…

x86 locking atomic cpu-architecture memory-bandwidth

asked May 30 '18 at 19:18

joz

319
1
9

Questions tagged [memory-bandwidth]