Sunday, June 7, 2009

OpenGL ES from the Ground Up Part 8: Interleaving Vertex Data

Technote 2230 makes many suggestions for improving the performance of your iPhone apps that use OpenGL ES. You're now far enough along in your understanding of OpenGL ES that you should read it. No, really. Go read it, I'll wait.

Okay, done? Under the section titled Optimizing Vertex Data, there's a somewhat cryptic recommendation to "submit strip-ordered indexed triangles with per vertex data interleaved". When Apple makes a recommendation, they usually have a good reason for it, so let's look at how we comply with this one.

First of all, let's look at what it means. Let's break it down:

Strip Ordered: In other words, if your model has adjacent triangles, submit them as triangle strips rather than submitting each triangle individually. We've talked about using triangle-strips in earlier installments, so you already know a little about doing that. It's not always possible to use triangle strips, but for a good many objects you will be able to, and whenever you can, you should because using triangle strips greatly decreases the amount of vertex data you have to push into OpenGL ES every frame.

indexed: This is also nothing new. We've been using vertex indices for a while now. Our spinning icosahedron uses them to create twenty faces with only twelve vertices. glDrawElements() draws based an indices rather than vertices.

Heck, we're doing great so far, aren't we? So far, we seem to be doing all the right things! Let's look at the last part of the recommendation, however:

with per vertex data interleaved: Okay, hmm.. What the hell does that mean?

Okay, time to test out your memory. Do you remember in several of the past installments when we time we talked about functions like glVertexPointer(), glNormalPointer(), glColorPointer(), or glTexCoordPointer ? In earlier installments, I told you not to worry about the parameter called stride and to just set it to 0.

Well, now you can start worrying about stride, because that's the key to interleaving your per vertex data.

Per Vertex Data

So, you might be wondering what "per vertex data" is and how you would interleave it.

You remember, of course, that in OpenGL ES we always pass geometry in using a vertex array, which is an array containing sets of three GLfloats that define the points that make up our objects. Along with that, we also sometimes specify other data. For example, if we use lighting and need vertex normals, we have to specify one normal per vertex in our normal array. if we use texture coordinates, we have to make sure that our texture coordinate array has one set of texture coordinates per vertex. And if we use a color array, we have to specify one color per vertex. Do you notice how I keep saying "per vertex"? Well, these types of data are what Apple is referring to when they say "per vertex data" in that Technote. It's anything that you pass as an array into OpenGL ES that supplies any kind of data that applies to the vertices in the vertex array.

Interleaving

Up until now in this series, we've created one array to hold the vertex data, and additional separate arrays to hold the normal data, color data, and/or texture coordinate data, like so:

separatearrays.png


What we're going learn how to do today is to smush all this data together into a single contiguous chunk of memory:

interleaved.png


Don't worry if you can't read the code in that illustration. When it becomes important, I'll give you the code listing again, that's just to illustrate the point that we're going to have all of our vertex data in a single glob of memory. What that's going to do is put all the data describing a single vertex together in one place in memory. That will allow OpenGL faster access to the information about each vertex. In today's installment, we're going to interleave vertices, normals, and color data, though the same exact technique would work for texture coordinates, or for just interleaving vertices and normals. In fact, in the accompanying Xcode Project, there are data structures defined to handle all three of those interleaving scenarios.

Defining a Vertex Node

In order for this to work, we need a new data structure. In order to interleave vertices, normals, and color data, we need a structure that looks like this:

typedef struct {
Vertex3D vertex;
Vector3D normal;
Color3D color;
}
ColoredVertexData3D;


Pretty straightforward, huh? You just create a struct with each piece of per-vertex data that we're using.

Next, of course, we need to populate our vertex data, so we need to combine those three static const arrays into a single one. Here's what the same icosahedron data looks like specified using a static array of our new datatype:

static const ColoredVertexData3D vertexData[] = {
{
{0, -0.525731, 0.850651}, // Vertex |
{0.000000, -0.417775, 0.675974}, // Normal | Vertex 0
{1.0, 0.0, 0.0, 1.0} // Color |
}
,
{
{0.850651, 0, 0.525731}, // Vertex |
{0.675973, 0.000000, 0.417775}, // Normal | Vertex 1
{1.0, 0.5, 0.0, 1.0} // Color |
}
,
{
{0.850651, 0, -0.525731}, // Vertex |
{0.675973, -0.000000, -0.417775}, // Normal | Vertex 2
{1.0, 1.0, 0.0, 1.0} // Color |
}
,
{
{-0.850651, 0, -0.525731}, // Vertex |
{-0.675973, 0.000000, -0.417775}, // Normal | Vertex 3
{0.5, 1.0, 0.0, 1.0} // Color |
}
,
{
{-0.850651, 0, 0.525731}, // Vertex |
{-0.675973, -0.000000, 0.417775}, // Normal | Vertex 4
{0.0, 1.0, 0.0, 1.0} // Color |
}
,
{
{-0.525731, 0.850651, 0}, // Vertex |
{-0.417775, 0.675974, 0.000000}, // Normal | Vertex 5
{0.0, 1.0, 0.5, 1.0} // Color |
}
,
{
{0.525731, 0.850651, 0}, // Vertex |
{0.417775, 0.675973, -0.000000}, // Normal | Vertex 6
{0.0, 1.0, 1.0, 1.0} // Color |
}
,
{
{0.525731, -0.850651, 0}, // Vertex |
{0.417775, -0.675974, 0.000000}, // Normal | Vertex 7
{0.0, 0.5, 1.0, 1.0} // Color |
}
,
{
{-0.525731, -0.850651, 0}, // Vertex |
{-0.417775, -0.675974, 0.000000}, // Normal | Vertex 8
{0.0, 0.0, 1.0, 1.0}, // Color |
}
,
{
{0, -0.525731, -0.850651}, // Vertex |
{0.000000, -0.417775, -0.675973}, // Normal | Vertex 9
{0.5, 0.0, 1.0, 1.0} // Color |
}
,
{
{0, 0.525731, -0.850651}, // Vertex |
{0.000000, 0.417775, -0.675974}, // Normal | Vertex 10
{1.0, 0.0, 1.0, 1.0} // Color |
}
,
{
{0, 0.525731, 0.850651}, // Vertex |
{0.000000, 0.417775, 0.675973}, // Normal | Vertex 11
{1.0, 0.0, 0.5, 1.0} // Color |
}

}
;


Here is how we pass the information into OpenGL. Instead of passing in the pointer to the appropriate array, we pass the address of the appropriate member of the first vertex in the array, and provide the size of that struct as the stride argument.

    glVertexPointer(3, GL_FLOAT, sizeof(ColoredVertexData3D), &vertexData[0].vertex);
glColorPointer(4, GL_FLOAT, sizeof(ColoredVertexData3D), &vertexData[0].color);
glNormalPointer(GL_FLOAT, sizeof(ColoredVertexData3D), &vertexData[0].normal);


The the last parameter in each of those calls a points to the data corresponding to the first vertex. So, for example, &vertexData[0].color points to the color information for the first vertex. The stride parameter identifies how many bytes of data need to be skipped before the same type of data for the next vertex can be found. That might make a little more sense if you look at this diagram (sorry, it's wide, you may have to expand your browser to see all of this one:

stridediagram.png


What could be easier, right? If you don't feel like typing it all in, you can download the interleaved version of the spinning icosahedron. I've also updated my OpenGL ES Xcode Template with these new data structures<.

We're still not using triangle strips, but merging triangles into triangle strips is going to have to be a subject for a future installment, because it's time to go meet some people at WWDC.



16 comments:

Michael said...

I first learned of this optimization from the lecture given recently at Standford (by the ngmoco employee whose name escapes me at the moment) as part of their iPhone programming course. Since then, I've been wondering what makes this technique such a performance gain. Is it strictly about having all of the data for each vertex packed together, thereby reducing the amount of cache thrashing as the GPU processes the verts?

Tyler Weir said...

Thanks for another great article Jeff. And just like Michael above, the Stanford lecture on iTunes U by Tim Omernick of ngmoco:) is excellent.

We're redoing the rendering pipeline of our game Evasion to align with these best practices.

Thanks!

Tyler Weir said...

That should be:
just like Michael *said* above

Quinn Taylor said...

Just a minor niggle that led to someone asking a related question on StackOverflow.com...

In [ProjectName]AppDelegate.m file, lines 25-27...

GLViewController *theController = [[GLViewController alloc] init];
self.controller = theController;
[theController release];


These could be reduced to a single, simpler line:

self.controller = [[[GLViewController alloc] init] autorelease];

If the controller were a temporary local object, I could understand avoiding autorelease to be sure the object doesn't linger in memory longer than is necessary. However, in this case the controller will be around until the memory is deallocated, so there is no harm in adding it to the autorelease pool rather than using an explicit -release.

Just a thought — I think it makes the code that much simpler. Thanks for your great posts!

Jeff LaMarche said...

Quinn:

Thanks for your feedback. Generally speaking, in both the book and for my blog postings, I usually avoid nesting message calls. In the book, we're pretty strict about it, and try to only nest alloc/init calls. For blog postings, I sometimes will nest messages, but generally try to avoid it.

You are right that in that particular chunk of code, there's no harm in using the autorelease pool other than losing a negligibly tiny bit of memory to tracking the view controller's pointer in the pool. I still prefer to keep it this way for two reasons.

First, many people who read my blog are relatively new to Objective-C, and for them, nested messages can be a little intimidating and make the code harder to read. The only real expense to avoiding nesting is a little vertical space, and for those unaccustomed to bracket message notation, I think it makes the code much more readable.

Second, many of those same people who are new to Objective-C are still struggling with memory management. I want to reinforce good habits by sticking to the iPhone dev mantra of "if you don't need the autorelease pool, don't use it".

For people like you who understand the memory rules, by all means, if you prefer nesting functions and using an autorelease call when it won't hurt, go for it. Code style is highly personal, and you should do what makes the code most readable to you (unless, of course, you work somewhere that has a defined code style, in which case you should stick to that).

But, for pedagogical purposes, I still think it's better to avoid much message nesting and to stick with the textbook approach to memory allocation. So, thanks for your suggestion, but I'm going to leave it the way it is in the sample project.

Jeff LaMarche said...

Michael:

The only specifically documented reason for this is memory access speed - it's faster to access contiguous chunks of memory or memory that are near each other than it is to jump back and forth.

There could very well be other reasons, but I don't have enough knowledge of the underlying implementation, and I have never seen other reasons enumerated, however.

I have been told also that you can gain some additional speed by aligning your vertices on dword boundaries. In this sample, that's unnecessary because we're using GLfloat, which just happen to be 32-bytes (or a double word) each, but if you're using GLubyte or GLint to specify some of the elements, sometimes it can be faster to insert a byte or two of padding between your vertices, or at least so I've heard.

hassanvfx said...

Hi master !!

I got you book Beginning iPhone Development. Thanks for writing it !

Im sure you can make sime light on my research.

First of all i try to make this with calayers but is turning very low.

So i need to make a stack of planes all of the same size, like CALayers, showing textures that create a composite image viewed from top.

WHERE IM MISSING IS.... how could i detect wich plane was "TOUCHED" depending not in the geomtry ( LIKE GL_SELECT or RAY TEST ) but in the visible mask area of the plane....

Please i will thank any advice

Spaced Cowboy said...

Hmm - don't suppose you know a good way to get the functionality of gluUnProject(...) do you ? I'm struggling to read the Z buffer on the iphone (needed for the Mesa implementation, and in the NeHe tutorials), so if you're looking for things to talk about [grin] ...

Tore said...

Hi Jeff, great tutorials! I have a question concerning the usage of triangle lists vs strips. You take the recommendation "strip ordered" to mean we should use triangle strips. However, after reading the "POWERVR 3D Application Development Recommendations" document (you can find it by searching for "recommendations" on imgtec.com) from the iPhone's graphics chip manufacturer, it seems that triangle lists are more efficient for the iPhone's hardware, because a triangle list needs one draw call only, whereas strips need one for each strip. The list of triangles should be sorted the same as in strips, however.

j337 said...

Say, I have some binary file with mesh representation, and I don't want to have all this vertex soup in RAM. I want to load it directly to VBO and forget about it. As I understood, that is impossible because of OSG base paradigm, that drawing context may be attached or changed any time, so OSG always get this arrays in system memory, loading to VBO on first compilation/change event. So, memory consumption is twice (or sometimes even 3 times) greater, than it could be. My Blog : earn money chao!

Mark said...

Technote 2230. Can't find it now. Anybody know a fresh link ? Thanks !

transmitterloc said...

technote 2230 this may be it

http://developer.apple.com/iphone/library/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/Performance/Performance.html

Joe's said...
This comment has been removed by the author.
Joe said...

I think this is (a copy of) the technote 2230:

http://gsdwrdx.tistory.com/92

Edwin said...

scrub m65 kamagra attorney lawyer body scrub field jacket lovegra marijuana attorney injury lawyer

h4ns said...

I was very encouraged to find this site. I wanted to thank you for this special read. I definitely savored every little bit of it and I have you bookmarked to check out new stuff you post.

AC Milan vs Lazio Live Streaming
West Bromwich Albion vs Wigan Athletic Live Streaming
Manchester United vs Aston Villa Live Streaming
Sunderland vs Chelsea Live Streaming
Arsenal vs Everton Live Streaming
Augsburg vs Bochum Live Streaming
Racing Santander vs Valencia Live Streaming
Frosinone vs Atalanta Live Streaming
AC Milan vs Lazio Live Streaming
West Bromwich Albion vs Wigan Athletic Live Streaming
Manchester United vs Aston Villa Live Streaming
Sunderland vs Chelsea Live Streaming
Arsenal vs Everton Live Streaming
Augsburg vs Bochum Live Streaming
Racing Santander vs Valencia Live Streaming
Frosinone vs Atalanta Live Streaming
Technology News | Hot News Today | Live Stream