brooksmoses | In which my brain melts. (Reply)

So, I seem to have figured out just how much work I could do today before my brain suddenly melted out my ears, by expedient of doing enough work to cause it to do so. As it happens, the stuff that I was working out fit fairly well into a narrative format, and I was thinking of making a blog post out of it, so I decided to write it up as one while I was working on it, and thus have a record of it in case anyone's interested. This is, I think, a fairly reasonable chronicle of how my brain works when I'm thinking about this sort of thing, albeit slightly cleaned up and put into nice neat order.

I should also add a disclaimer that I have no idea whether this is going to make sense to anyone but me; a large part of it is only minimally-cleaned-up stream-of-consciousness writing, and may well make no sense at all to anyone else. And I know it could stand to have some diagrams added. I'm mainly just posting it to see if the brain-melty is contagous.

Long, complicated data-structure design behind cut. You've been warned.

This post is somewhat of an experiment. I'm writing a program to process some data from a colleague, and the current challenge is to write a data structure to contain the data once it's loaded. The data is on an unstructured triangular mesh; that is, each data point corresponds to a random point somewhere in the space, and these points are connected to make a random mesh of triangles. (They're not really random, of course; they're set up so that the triangles are all relatively close to equilateral, and are small where more detail is needed and large where it isn't, but they're essentially in random order.)

This is an extremely common type of data, so I could easily go read piles of papers of optimal representations, but for now I'd rather design my own. I could use the practice, and it's an interesting challenge. So I'm going to try working out the design as I write it up in this post; I thought it through somewhat last night, but I haven't worked out the details, so I'll do that as I go along.

Problem Definition

For this sort of data, there are three types of things one wants to work with: points (consisting of one data point), edges (consisting of a directed link between two data points), and triangles (consisting, depending on perspective, of three points or three edges). The data representation needs to provide the capability to move between the three things – for instance, given a triangle, we should be able to determine the points and edges that are part of it, and given a point we should be able to determine which triangles it's a part of.

There are a range of ways to represent this. At one extreme are the data files themselves, which are simply a list of which points make up each triangle. This has the advantage of taking up a minimal amount of space, and not storing redundant information. However, to determine which triangles contain a given point, one has to iterate through the entire list of triangles and check each one. In fact, any operation other than determining which points are in a triangle takes an iteration through the entire list (or, at least, through as much of the list as it takes to find a match.)

The other extreme is to store a list of every possible pointer. (Here, I don't mean "pointer" in the computer-language sense; these are usually just numbers corresponding to positions in the list of points, edges, or triangles.) So, for a triangle, one would store three pointers to the points in the triangle, three pointers to the edges of the triangle, and three pointers to the adjacent points. An edge has two pointers to its endpoints and two pointers to the triangles on either side. And points have lists of pointers to the triangles and edges that they're part of. Because there's no fixed number for how many triangles and edges connect at a point, these lists must be implemented in some dynamic way that doesn't require them all to be the same length. For this representation, any lookup operation can be done directly, but there's a tremendous amount of storage of mostly-redundant data involved.

As with nearly all programming problems of this sort, what we want is something between these extremes. Ideally, we want something almost as fast as the latter method, but without nearly the storage requirement. In particular, it would be nice to avoid the arbitrary-length lists for the points.

Defining the Data Structure

To start with, I'm going to ignore edges for now. I don't need them for the data that I'm doing (since I don't have any calculations that require storing data on an edge), and that simplifies the problem.

So, let's start with a triangle. A triangle has three points, and three adjacent triangles. Let's give all of these a per-triangle number as well as their global order. For convenience, require that the points are numbered in a counterclockwise order starting from zero (we start at zero because we'll be doing lots of modulo-3 arithmetic with them), and the adjacent triangles are numbered such that triangle 1 joins on the edge from point 1 to point 2, triangle 2 shares point 2 and point 0, and triangle 0 shares point 0 and point 1.

There's one more piece of data that we might want to store for each adjacent triangle, that I haven't mentioned yet: What is that triangle's number for this triangle? This will turn out to be quite useful later, so let's store that as well. We'll see why when we consider the points.

For a point, now, it turns out that we only need to store two pieces of data: we need the number of one triangle that contains the point, and what that triangle's number for this point is. If we have that, and the adjacent-triangle information, it turns out that we already have the list of all of the triangles that contain the point, as a linked list that happens to be cleverly hidden in the triangle data.

How does that work? Well, for a point P we have one triangle; call it A. If the point P is numbered i on triangle A, then the next triangle (call it B) in counterclockwise order around the point is stored as adjacent-triangle number i of A. So, that's two triangles. From A's data, we find that A is adjacent-triangle number j of B; thus, the next triangle around P must be B's adjacent-triangle number j+1 (mod 3). And, from there, we can simply repeat this until we get back to A. If, for some reason, we want to iterate clockwise rather than clockwise, the process is very similar.

And thus, that set of data is enough to contain a complete mapping from triangle to point, triangle to triangle, and point to triangle, without a need for either arbitrary-length lists or iterating over the entire global list of points and triangles. We could do away with storing the "what is the adjacent triangle's number for this triangle/point?" data, since that can be found easily by checking each of the three possibilities, but I suspect it's worth the space to store it.

Now for all the complications. (What, you thought it was really that simple? Ha!)

First, some points don't have a complete chain of triangles encircling them; they're at the edge of the domain. It's not immediately obvious why we might care; if the point is simply to go through all the triangles around a point, we should just store the next triangle in counterclockwise order around the point, whether it's adjacent or there's a gap. But: What if we're iterating around the other point, in clockwise order? If there's a gap, then this would point to a triangle that's not even adjacent to the other point at all!

Fundamentally, this causes a problem: we have a spot for one pointer in the data structure, and we actually need two -- one for a counterclockwise iteration around one point, and one for a clockwise iteration around the other point. I'll deal with this by making an assumption: any given point is only on one edge at a time. Thus, any point has at most one gap around it; there are no cases where two triangles touch at one point with gaps at both sides. Then, for any given point, we only need one extra pointer. And it turns out that there's one we haven't quite used up yet: the point's pointer to the first triangle. If this points to the next triangle for a counterclockwise iteration (that is, the "starting edge" for a counterclockwise iteration), then the adjacent-triangle number can point to the next triangle for a clockwise iteration.

The only remaining problem is how to tell when we get to the end of a counterclockwise iteration by having reached the mush boundary, since this no longer points back to the starting triangle. A fix for that is to use the sign bit of the pointer -- since these are integers, and Fortran doesn't have an unsigned integer type and starts arrays at 1 rather than 0, the sign bit is present and unused, and we can use it to signal whether the triangle is adjacent to the boundary rather than to another triangle.

So, the "adjacent triangle" pointer, if negative, actually points to the next triangle along the mesh boundary in a clockwise traversal of the boundary. To do a counterclockwise traversal of the boundary, we instead take the point with the same number as the boundary-adjacent edge, find its "first triangle", go around that triangle one point counterclockwise, and repeat.

Creating The Mesh

The next complication is: how do we create this from the data files?

We'll handle this by induction. Suppose that we start with a partly-completed structure, and we wish to add an additional triangle to it. For the additional triangle, all we know are the numbers for the points it contains.

So, start with (for the sake of argument) point 0. If triangle 0 has been added to the structure already, then it will be one of the triangles around point 0, and it will share edge 0. So, we iterate through the triangles around point 0, and if one of them has the same edge (which we determine by looking for point 1 in the corresponding place), then we make a pointer that says that it is triangle 0 of the new triangle, and make the appropriate corresponding pointer from it to the new triangle.

The tricky bit here is that we need to iterate around all the triangles around point 0, even though the structure is incomplete. Thus, the linked list has to always be valid no matter what state the structure is in. A crucial problem: for a triangle with an arbitrary number of triangles around it, there will be an arbitrary number of gaps; however, the formulation for the linked list assumes that there is at most one gap. This is not a fatal complication, however; recall that the restriction on the number of gaps was only required so that both counterclockwise and clockwise iterations around points would continue properly across gaps. If we only use counterclockwise iterations while building the structure, we can relax this restriction while the structure is incomplete.

So, to start with, the list for each point is empty. This can be denoted by setting the "first triangle" pointer for each point to zero.

Next, we add one triangle to the list for a given point. It has one edge which participates in a counterclockwise linked list around this point; thus, we set this edge to the next triangle around that point – which is the triangle we are adding, as there are no more.

To add a second triangle, we iterate through the existing list, and check whether there are any shared edges. There are two possible shared edges; one for the counterclockwise edge of the new triangle, and one for the clockwise edge. These are essentially independent searches, as there is no guarantee that the iteration of "clumps" of triangles around the point will be in geometric order (for instance, supposing there are four triangles and four gaps around a point, a new triangle could have shared edges with the first and third triangles).

A simple way to deal with this is to first add the triangle to the list, then search the list for an edge shared with the counterclockwise edge of the new triangle and patch it together if found, and then search for an edge shared with the clockwise edge and again patch it together if found.

Adding the triangle to the list is relatively easy, though since the list is cyclic it does require an iteration through the list to find the tail of the list. The counterclockwise edge of the new triangle points at the first triangle, and the triangle that formerly pointed at the first triangle now points at the new triangle. This requires that we presume that there is a gap between the tail triangle and the first triangle as long as there are still triangles to be added; this is true at the onset, so we shall preserve it as an invariant.

Iterating through the triangles to find a shared edge with the counterclockwise edge of the new triangle is trivial (one just checks for the same second point of the edge); the tricky bit is to rearrange the list if one is found. Prior to the patching, the connections before and after the new triangle are considered gaps, and thus it is valid to move any block of triangles into that gap. Specifically, we can move all of the triangles between the new triangle and the shared-edge triangle into the gap, by pointing the new triangle at the shared-edge triangle, pointing the triangle that was pointing at the shared-edge triangle at the new triangle, pointing the triangle that was pointing at the new triangle (namely, the tail triangle) at the triangle the new triangle had been pointing to (namely, the first triangle).

Note that the latter operation is merely restoring the original connection of the list, and thus it may be more efficient not to break it in the first place -- which amounts to inserting the new triangle where it fits, rather than inserting it an moving it; only if it fits nowhere is it places in the gap between the first and tail triangles.

Also note that the block-moving operation should only occur if the blocks are not already in the right place!

The connection between the new triangle and the shared-edge triangle is not a gap. This may break our invariant regarding the gap between the first and tail triangles if the shared-edge triangle is the first triangle. Thus, if the firat triangle is the shared-edge triangle, the point's first-triangle pointer should be moved to point to the new triangle, so the gap is retained.

Next, we iterate through to find any shared edges with the clockwise edge. If one is found, we again may need to rearrange by moving blocks into gaps. The shared-edge triangle gets pointed at the new triangle, and the triangle that had pointed at the new triangle now points at ... well, there's an interesting question. We can no longer count on having the convenient gap on the other side of the new triangle, so we have to go hunting for one. The easiest way to do this is to start at the new triangle and iterate counterclockwise until we find one. Then, the triangle that had pointed at the new triangle is pointed at the triangle after the gap, and the triangle before the gap is pointed at the triangle the shared-edge triangle had pointed at. Again, there is a risk that this closes up the gap between the first and tail triangles, and so the point's first-triangle pointer should be pointed at the one of the triangles after the newly-relinked gaps.

One might ask whether there is a risk that a third gap will not be found. This lack of a third gap can only occur if the blocks are already correctly arranged, however (as there is no way to arrange only one or two gaps in an incorrect order), and so this is not a problem if we avoid rearranging unless it's needed.

However, there is one more issue: if rearrangement is not needed, then there is the possibility that the gap between the first and tail pointers just got closed up. In that case, we do need to iterate around until a gap is found, and point the point's first-triangle pointer at the triangle after the gap. In the case that no gap is found, which happens when all triangles have been added to an internal point, the first-triangle pointer can be left pointing to any of the triangles.

So, that's the process of adding triangles around a point. Note that all of this only sets the pointers that are in a counterclockwise direction around the point in question. This is perfectly acceptable as a starting version; the clockwise pointers are counterclockwise pointers around a different point, and will get processed when those points are considered. It also means that the points are processed in a completely independent manner, and the same code can be used to add the triangle to its second and third points.

(There is, however, a promise of a bit of optimization if we do consider the clockwise pointers in cases where patching together of triangles occurs, as it avoids searching around the next point. Given how complicated things are already, however, this is almost assuredly not worth the programming time.)

Simplifications

That is all quite complicated, much more so than I thought it would be. Moreso than it should be, too, I think. So, how can it be simplified?

One simplification is the hunting for a third gap when doing the clockwise edge. We already know where a third gap is: if the counterclockwise edge didn't match then it's at a third gap (which the first-triangle pointer is conveniently already pointing at); if it did match, then either the first triangle was uninvolved and is still in front of a gap, or...

...well, we had three, right? And we used one, so the other two have to still be somewhere, and presumably one of them would be useful for this, if I could figure out which....

(At this point, my brain entirely melted and dripped out my ears, and I stopped for the day.)

In which my brain melts.

Problem Definition

Defining the Data Structure

Creating The Mesh

Simplifications

Post a comment in response: