I have a Windows application that currently renders graphics largely using MFC that I'd like to change to get better use out of the GPU. Most of the graphics are straightforward and could easily be built up into a scene graph, but some of the graphics could prove very difficult. Specifically, in addition to the normal mesh type objects, I'm also dealing with point clouds which are liable to contain billions of Cartesian stored in a very compact manner that use quite a lot of custom culling techniques to be displayed in real time (Example). What I'm looking for is a mechanism that does the bulk of the scene rendering to a buffer and then gives me access to that buffer, a z buffer, and camera parameters such that I can modify them before putting them out to the display. I'm wondering whether this is possible with Direct3D, OpenGL or possibly use a higher level framework like OpenSceneGraph, and what would be the best starting point? Given the software is Windows based, I'd probably prefer to use Direct3D as this is likely to lead to fewest driver issues which I'm eager to avoid. OpenSceneGraph seems to provide custom culling via octrees, which are close but not identical to what I'm using.
Edit: To clarify a bit more, currently I have the following;
A display list / scene in memory which will typically contain up to a few million triangles, lines, and pieces of text, which I cull in software and output to a bitmap using low performing drawing primitives
A point cloud in memory which may contain billions of points in a highly compressed format (~4.5 bytes per 3d point) which I cull and output to the same bitmap
Cursor information that gets added to the bitmap prior to output
A camera, z-buffer and attribute buffers for navigation and picking purposes
The slow bit is the highlighted part of section 1 which I'd like to replace with GPU rendering of some kind. The solution I envisage is to build a scene for the GPU, render it to a bitmap (with matching z-buffer) based on my current camera parameters and then add my point cloud prior to output.
Alternatively, I could move to a scene based framework that managed the cameras and navigation for me and provide points in view as spheres or splats based on volume and level of detail during the rendering loop. In this scenario I'd also need to be able add cursor information to the view.
In either scenario, the hosting application will be MFC C++ based on VS2017 which would require too much work to change for the purposes of this exercise.