I've been developing for a bit an invisible (read: doesn't produce any visual output) stressor to test the capabilities of my graphics card (and as a exploration of DirectCompute in general, with which I'm pretty new). I've got the following code right now that I'm pretty proud of:
RWStructuredBuffer<uint> BufferOut : register(u0);
[numthreads(1, 1, 1)]
void CSMain( uint3 DTid : SV_DispatchThreadID )
{
uint total = 0;
float p = 0;
while(p++ < 40.0){
float s= 4.0;
float M= pow(2.0,p) - 1.0;
for(uint i=0; i <= p - 2; i++)
{
s=((s*s) - 2) % M;
}
if(s < 1.0) total++;
}
BufferOut[DTid.x] = total;
}
This runs the Lucas Lehmer Test for the first 40 powers of two. When I dispatch this code in a timed loop and look at my graphics cards stats using GPU-Z, my GPU load shoots to 99% for the duration. I'm pretty happy with this, but I also notice that the heat generation from a fully loaded out GPU is actually pretty minimal (I'm getting about a 5 to 10 degree Celsius jump, nowhere near the heat jump I get when running, say, Borderlands 2). My thought is that most of my heat comes from memory accesses, so I would need to include consistent memory accesses across the run. My initial code looked like this:
RWStructuredBuffer<uint> BufferOut : register(u0);
groupshared float4 memory_buffer[1024];
[numthreads(1, 1, 1)]
void CSMain( uint3 DTid : SV_DispatchThreadID )
{
uint total = 0;
float p = 0;
while(p++ < 40.0){
[fastop] // to lower compile times - Code efficiency is strangely not what Im looking for right now.
for(uint i = 0; i < 1024; ++i)
float s= 4.0;
float M= pow(2.0,p) - 1.0;
for(uint i=0; i <= p - 2; i++)
{
s=((s*s) - 2) % M;
}
if(s < 1.0) total++;
}
BufferOut[DTid.x] = total;
}