Given a vertex array: {v1, v2, v3, v4, v5, ..., vN}
And K polygons indexing it with blocks like so for a sample 4-sided polygon*: {v7, v2, v51, v16}
Note that two or more polygons may share the same vertex. In fact, most vertices will be shared by 4-6 polygons (valence 4 for quad meshes, valence 6 for triangle meshes).
... how might we efficiently reorder/sort the vertex data such as to mitigate cache misses when reading the vertices of a given polygon? I'm interested in an algorithm that finishes in a reasonable time more than one that gives the best possible results. Even some rough heuristic is better than a completely arbitrary order here.
The ideal is to seek to turn something like {v1052, v507213, v63252, v3} as something more like: {v70, v71, v72, v73}. This ideal will probably often be unachievable without some outliers due to the amount of sharing of the vertices.
Full-blown solutions are certainly welcome but I'm more interested in just the name given to the family of algorithms concerned with this kind of thing of reorganizing user data at runtime for cache efficiency. I imagine such algorithms must exist, especially in the realm of efficient VBO representations of a mesh, but I have failed to come up with the proper search criteria. Would this still be called 'sorting'?
I can see that there might be two categories of ways to tackle this problem: one actually dealing with traversing mesh neighbors which would be very specific to the mesh representation and another to simply look at the general access pattern of one set of arrays indexing or pointing to entries in another. The latter appeals to me a bit more as a more generalized solution, even if it may not be as effective.
Update:
As suggested, I attempted just coding some suggested heuristics instead of speculating too much about them algorithmically. It gave me some rough base to start with and helped me understand the nature of the problem better, but unfortunately with poor results. I tagged the question as both C and C++ since I was more interested in algorithmic research suggestions and possibly snippets in those languages than providing my own implementation, but here's mine in C++ just to compare:
#define _SECURE_SCL 0
#include <iostream>
#include <vector>
#include <cmath>
#include <cassert>
#include <algorithm>
#include <deque>
using namespace std;
static const float pi_f = 3.14159265358979323846f;
enum {poly_size = 3};
class Alloc
{
public:
Alloc(): head(0)
{
}
~Alloc()
{
purge();
}
void purge()
{
while (head)
{
Pool* next = head->next;
free(head);
head = next;
}
}
void* allocate(int size)
{
size = (size + 15) & (~0x0f);
if (head && (head->used + size) < head->capacity)
{
void* mem = head->mem + head->used;
head->used += size;
return mem;
}
int new_pool_size = 4096;
if (size > new_pool_size)
new_pool_size = size;
Pool* new_pool = static_cast<Pool*>(malloc(sizeof(Pool) + new_pool_size));
new_pool->used = size;
new_pool->capacity = new_pool_size;
new_pool->next = head;
head = new_pool;
return new_pool->mem;
}
private:
struct Pool
{
Pool* next;
int used;
int capacity;
char mem[1];
};
Pool* head;
};
struct Vertex
{
Vertex(): x(0), y(0), z(0) {}
float x, y, z;
};
struct VertexPolys
{
VertexPolys(): num_polys(0), polys(0) {}
int num_polys;
int* polys;
};
struct Poly
{
Poly()
{
vertices[0] = -1;
vertices[1] = -1;
vertices[2] = -1;
}
int vertices[poly_size];
};
struct IndexSet
{
explicit IndexSet(int n): num(0), data(n), used(n, false) {}
void insert(int index)
{
if (!used[index])
{
data[num++] = index;
used[index] = true;
}
}
int num;
vector<int> data;
vector<char> used;
};
struct Mesh
{
void reorder_vertices(const vector<int>& new_order)
{
assert(new_order.size() == vertices.size());
vector<int> to_new_order(new_order.size());
for (size_t i=0; i < new_order.size(); ++i)
to_new_order[new_order[i]] = i;
for (size_t i=0; i < polys.size(); ++i)
{
Poly& poly = polys[i];
for (int j=0; j < poly_size; ++j)
poly.vertices[j] = to_new_order[poly.vertices[j]];
}
vector<Vertex> reordered_vertices(vertices.size());
for (size_t i=0; i < new_order.size(); ++i)
reordered_vertices[i] = vertices[new_order[i]];
vertices.swap(reordered_vertices);
}
vector<Vertex> vertices;
vector<Poly> polys;
vector<VertexPolys> vertex_polys;
};
static void create_sphere(Mesh* mesh, float radius, int rings, int sectors)
{
const float ring_step = 1.0f / (float)(rings-1);
const float side_step = 1.0f / (float)(sectors-1);
const int total_verts = rings * sectors;
// Create sphere vertices.
vector<int> indices;
indices.reserve(total_verts);
for (int r=0; r < rings; ++r)
{
for (int s=0; s < sectors; ++s)
{
indices.push_back(mesh->vertices.size());
const float x = cos(2*pi_f * s * side_step) * sin(pi_f * r * ring_step);
const float y = sin(-pi_f/2 + pi_f * r * ring_step);
const float z = sin(2*pi_f * s * side_step) * sin(pi_f * r * ring_step);
Vertex new_vertex;
new_vertex.x = x * radius;
new_vertex.y = y * radius;
new_vertex.z = z * radius;
mesh->vertices.push_back(new_vertex);
}
}
// Create sphere triangles.
for (int r=0; r < rings-1; ++r)
{
for (int s=0; s < sectors-1; ++s)
{
int npv[4] =
{
r * sectors + s,
r * sectors + (s+1),
(r+1) * sectors + (s+1),
(r+1) * sectors + s
};
for (int j = 0; j < 4; ++j)
npv[j] = indices[npv[j] % total_verts];
Poly new_poly1;
new_poly1.vertices[0] = npv[0];
new_poly1.vertices[1] = npv[1];
new_poly1.vertices[2] = npv[2];
mesh->polys.push_back(new_poly1);
Poly new_poly2;
new_poly2.vertices[0] = npv[2];
new_poly2.vertices[1] = npv[3];
new_poly2.vertices[2] = npv[0];
mesh->polys.push_back(new_poly2);
}
}
}
static Mesh create_mesh(Alloc& alloc)
{
Mesh mesh;
create_sphere(&mesh, 10.0f, 100, 100);
// Shuffle the vertex order to make it all random (this tends
// to reflect a real-world model better which can get very arbitrary
// in terms of its vertex/polygon creation order).
vector<int> new_vertex_order(mesh.vertices.size());
for (size_t j=0; j < mesh.vertices.size(); ++j)
new_vertex_order[j] = static_cast<int>(j);
random_shuffle(new_vertex_order.begin(), new_vertex_order.end());
mesh.reorder_vertices(new_vertex_order);
// Count the number of polygons connected to each vertex.
mesh.vertex_polys.resize(mesh.vertices.size());
for (size_t i=0; i < mesh.polys.size(); ++i)
{
const Poly& poly = mesh.polys[i];
for (int j=0; j < poly_size; ++j)
{
const int vertex_index = poly.vertices[j];
++mesh.vertex_polys[vertex_index].num_polys;
}
}
// Allocate space for the vertex polygons.
for (size_t i=0; i < mesh.vertex_polys.size(); ++i)
{
VertexPolys& vp = mesh.vertex_polys[i];
vp.polys = static_cast<int*>(alloc.allocate(vp.num_polys * sizeof(int)));
vp.num_polys = 0;
}
// Form the polygons connected to each vertex.
for (size_t i=0; i < mesh.polys.size(); ++i)
{
const Poly& poly = mesh.polys[i];
for (int j=0; j < poly_size; ++j)
{
const int vertex_index = poly.vertices[j];
VertexPolys& vp = mesh.vertex_polys[vertex_index];
vp.polys[vp.num_polys++] = i;
}
}
return mesh;
}
static void output_stats(const Mesh& mesh, const char* type)
{
// Measure the proximity of each pair of vertices in each polygon.
int num_optimal = 0;
float prox_sum = 0.0f;
for (size_t i=0; i < mesh.polys.size(); ++i)
{
const Poly& poly = mesh.polys[i];
const int prox1 = abs(poly.vertices[1] - poly.vertices[0]);
const int prox2 = abs(poly.vertices[2] - poly.vertices[1]);
const int prox3 = abs(poly.vertices[0] - poly.vertices[2]);
const float poly_prox = (prox1 + prox2 + prox3) / 3.0f;
prox_sum += poly_prox;
if (prox1 <= 6 && prox2 <= 6 && prox3 <= 6)
++num_optimal;
}
cout << "-------------------------------------------------" << endl;
cout << type << endl;
cout << "-------------------------------------------------" << endl;
cout << "-- # Vertices: " << mesh.vertices.size() << endl;
cout << "-- # Polygons: " << mesh.polys.size() << endl;
cout << "-- # Optimal Polygons: " << num_optimal << endl;
cout << "-- Average vertex proximity: " << (prox_sum / mesh.polys.size()) << endl;
}
static void basic_optimization(Mesh& mesh)
{
IndexSet index_set(static_cast<int>(mesh.vertices.size()));
for (size_t i=0; i < mesh.polys.size(); ++i)
{
const Poly& poly = mesh.polys[i];
for (int j=0; j < poly_size; ++j)
{
const int vertex_index = poly.vertices[j];
index_set.insert(vertex_index);
}
}
mesh.reorder_vertices(index_set.data);
}
static void breadth_first_optimization(Mesh& mesh)
{
IndexSet index_set(static_cast<int>(mesh.vertices.size()));
assert(!mesh.polys.empty());
deque<int> to_process;
vector<char> processed(mesh.polys.size(), false);
to_process.push_back(0);
processed[0] = true;
while (!to_process.empty())
{
const int poly_index = to_process.front();
to_process.pop_front();
const Poly& poly = mesh.polys[poly_index];
for (int i=0; i < poly_size; ++i)
{
const int vertex_index = poly.vertices[i];
index_set.insert(vertex_index);
const VertexPolys& vp = mesh.vertex_polys[vertex_index];
for (int j=0; j < vp.num_polys; ++j)
{
if (!processed[vp.polys[j]])
{
to_process.push_back(vp.polys[j]);
processed[vp.polys[j]] = true;
}
}
}
}
mesh.reorder_vertices(index_set.data);
}
int main()
{
// Linear Heuristic
{
Alloc alloc;
Mesh mesh = create_mesh(alloc);
basic_optimization(mesh);
output_stats(mesh, "Linear Heuristic");
}
// Breadth-First Heuristic
{
Alloc alloc;
Mesh mesh = create_mesh(alloc);
breadth_first_optimization(mesh);
output_stats(mesh, "Breadth-First Graph Heuristic");
}
}
Results:
-------------------------------------------------
Linear Heuristic
-------------------------------------------------
-- # Vertices: 10000
-- # Polygons: 19602
-- # Optimal Polygons: 198
-- Average vertex proximity: 67.0118
-------------------------------------------------
Breadth-First Graph Heuristic
-------------------------------------------------
-- # Vertices: 10000
-- # Polygons: 19602
-- # Optimal Polygons: 13
-- Average vertex proximity: 88.9976
The nature of the problem seems to boil down to mapping this 2D, highly-connected topological mesh grid into a 1-dimensional space with good locality. It's why I think these basic algorithms don't fare too well: the connectivity and seamlessness of the mesh makes it impossible to map the mesh very effectively by a heuristic that traverses it in its entirety.
It's similar to texture atlas mapping where we're trying to take the 3D mesh and 'unfold' it and map it into a flat 2D space. To even solve the problem, we often need to introduce seams into the mesh so that we can break it down into 2D islands of connectivity. Here we're similarly going from 2D graph space to break it down into islands with good locality in 1D memory space.
So I think an optimal algorithm would break the mesh down into segments (sections) this way and apply these types of algorithms on each segment. This way we get good 1D memory locality within a given segment of the mesh, with the pathological cases being the outliers at the boundaries of each segment where the vertices are shared with other segments.
A variation of the breadth-first search could do this crudely if it only traversed to a breadth of K for a given centroid polygon, P, where K is some small number, before choosing a completely different P. That would isolate small connected 2D islands in a mesh that can be mapped to a 1-dimensional memory space with good locality within each island (just poor at the boundaries of those islands).
Anyway, I was mainly looking for formal algorithmic names for such a thing. I'm still not sure if this would be called 'sorting' -- perhaps I'm searching for the wrong thing.