-1

I'm using glDrawArraysInstanced to draw 10000 instances of a simple shape composed of 8 triangles.

On changing the dedicated graphics card that is to be used to my NVIDIA GTX 1060, it seems i'm getting lower framerate and also some visible stuttering.

This is the code i'm using to see time taken for each frame :

std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
float i = (float)(std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count()) / 1000000.0;
while (!glfwWindowShouldClose(window)){
    end = std::chrono::steady_clock::now();
    i = (float)(std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count()) / 1000000.0;
    std::cout << i << "\n";
    begin = end;  //Edit
    //Other code for draw calls and to set uniforms.
}

Is this the wrong way to measure time elapsed per frame? If not, why is there a drop in performance?

Here is the comparison of the output :

Comparison Image

Updated Comparision Image

Edit :

Fragment Shader simply sets color for each fragment directly.

Vertex Shader Code :

#version 450 core

in vec3 vertex;

out vec3 outVertex;

uniform mat4 mv_matrix;
uniform mat4 proj_matrix;
uniform float time;

const float vel = 1.0;

float PHI = 1.61803398874989484820459; 

float noise(in vec2 xy, in float seed) {
    return fract(tan(distance(xy * PHI, xy) * seed) * xy.x);
}


void main() {

    float y_coord = noise(vec2(-500 + gl_InstanceID / 100, -500 + gl_InstanceID % 100),20) * 40 + vel * time;
    y_coord = mod(y_coord, 40)-20;

    mat4 translationMatrix = mat4(vec4(1,0,0, 0 ),vec4(0,1,0, 0 ),vec4(0,0,1, 0 ),vec4(-50 + gl_InstanceID/100, y_coord, -50 + gl_InstanceID%100,1));

    gl_Position = proj_matrix * mv_matrix * translationMatrix*vec4(vertex, 1);

    outVertex = vertex;
}

I'm changing the card used by Visual Studio for rendering here :

extern "C" {
    _declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001;
}

Output is same for both and is shown here :

Output

Desired output is increased frame-rate while using dedicated GPU card to render, that is smaller time gaps between the rows in the Comparison image attached. For Intel Integrated Card, it takes <0.01 seconds to render 1 frame. For Dedicated GPU GTX 1060, it takes ~0.2 seconds to render 1 frame.

curs0r
  • 21
  • 1
  • 7
  • Edited for clarity and desired output. – curs0r Jul 01 '20 at 18:25
  • I have also used RenderDoc to measure FPS and there is a huge difference. When using the Intel Card, I get around 340 FPS. On changing to the NVIDIA Card, I get a mere 44 FPS. – curs0r Jul 02 '20 at 20:13
  • This seems to be a issue not with my application but with the graphics card usage. I tried various applications built on OpenGL and it seems the FPS is locked at 45. I changed "OpenGL Rendering GPU" property in NVIDIA Control Panel but it didn't work. – curs0r Jul 03 '20 at 18:19

2 Answers2

1

I solved the issues by Disabling NVIDIA Physx GPU acceleration. For some reason it slows down graphic rendering. Now I'm getting about ~280 FPS on my GPU even when rendering ~100k instances.

curs0r
  • 21
  • 1
  • 7
0

Your output clearly shows the times monotonically increasing, rather than jittering around some mean value. The reason for this is that your code is measuring total elapsed time, not per-frame time. To make it measure per-frame time insstead, you need a begin = end call at the end of your loop, so that the reference point for each frame is the end of the preceding frame, rather then the start time of the whole program.

Drew Hall
  • 28,429
  • 12
  • 61
  • 81
  • Hey, I shouldve used that. Won't I get time pre frame if I take difference of consecutive rows in my current output? – curs0r Jul 02 '20 at 19:31
  • @curs0r:: Yes you would. Also, sorry my answer doesn't really address your true question about the performance difference between integrated & GPU graphics. My suspicion is that the tan() call is using the dedicated x86 intrinsic w/integrated graphics, whereas the GPU has to do more work to compute it. No proof--just a wild guess. – Drew Hall Jul 02 '20 at 19:38
  • That might be the case, I'll try to look more into it. Thanks for your input, I've added another comparison image for better clarity. – curs0r Jul 02 '20 at 19:43