Tuesday, July 21, 2009

To VBO or Not to VBO...

Daniel Pasco of Black Pixel Luminance has an interesting blog post today on the comparative performance between using VBOs and not using VBOs.

Surprising results, to say the least, and it really makes you wonder why Apple is recommending VBOs for the iPhone.



17 comments:

warmi said...

Well, I am not sure exactly what he was comparing but the main improvement is on the CPU side and unless the CPU is already being stressed , obviously it wont make any difference. In other words, any speed improvement will not show up itself unless your app is CPU limited ( which most 3G apps were once you were pushing more than 10-20 K vertices because of the copying of vertices around)

Chris said...

Interesting article. Would be nice to know the exact number of vertices. I messaged dlpasco on twitter to see if he could share those numbers with us all. Great find.

Jeff LaMarche said...

According to Daniel, he ran the tests over a range of 256 vertices up to 32,000 vertices.

Chris said...

Jeff, thanks for the follow up. I suppose Daniel did some sort of averaging then on the results? It would be really nice to see under the hood of what he did perform so that we can verify the usage of VBO. It's great though that he did perform any tests of this nature.

Chris said...

One final thing. Just found Daniel's twitter account. He posted:

"I'm getting a lot of questions about the testing I did. The results were for 1k verts. Trend was verified on sets ranging from 256 to 32k."

warmi said...

Ok, I just tested a model with 45 894 vertices/ 15298 faces .. + 3 point lights + 1 directional light, no textures just coords+normals


Here are my results ( 3gs using OpenGL 1.x)

No VBOs - ~ 30 FPS
VBO -GL_DYNAMIC DRAW - ~ 30 FPS
VBO -GL_STATIC_DRAW - 60 FPS ( could be even higher for all I know since we are capping at 60 FPS)

So there is a very clear difference but you have to use GL_STATIC_DRAW for otherwise VBOs behave just like client arrays.

dlpasco said...

warmi,
I would *love it* if you were right. I'm using GL_STATIC_DRAW for my own code and am definitely seeing the results I published today, even on my 3GS - no change of any kind between client side arrays and VBOs.
I'll be publishing my code this weekend (still doing some cleanup on it and double checking myself to see if I've screwed anything up anywhere). Once I've got the code up I'd be thrilled if you could tell me what I'm doing wrong here. :)

-Daniel

warmi said...

I trippled triangle count , everything is stripped to bare minimum ... just one light + 130K vertices (9 models from CounterStrike Source) and here is what I am consistently getting:

http://www.warmi.net/tmp/test_vbo.png

It still holds. VBOs being essentially twice as fast as client arrays.
I verified it with Instrument's own FPS count just to make sure I wasn't missing anything.

PS.
It only applies to 3GS of course.

Noel said...

It's a well-known fact that the OpenGL driver on the iPhone 3G blows, and they pretty much ignore VBOs and transform the data on the CPU and send it over to the GPU every frame.

On the other hand, Apple engineers told me the OpenGL ES 2.0 driver (for 3GS) is fixed, so any code running with VBOs is going to be much, much faster.

Since it makes no difference, you might as well use VBOs. Can't really go wrong that way.

Jeff LaMarche said...

I'm curious what the difference between Warmi's and Daniel's tests is. I wonder if there's some setting or functionality that causes you to lose the benefit of the VBO? If so, that would certainly be good to know.

warmi said...

I don't really know at this point but one thing I did notice looking at the results of Instruments (OpenGL ES) is that with VBO based pipeline, something called "Command Buffer Submitted Bytes" is about 10 times lower than when rendering with client arrays.
That's about the only difference (according to Instruments at least)

Charlie said...

Real glad to read about folks digging into this issue. I've had questions about the merits of VBO's, but hadn't done any performance tests yet.

On a related note, for those of us too poor to buy a MacBook Pro running OpenGL 2.0, my vanilla MacBook circa late 2007 has OpenGL v1.2 on it which does *not* support VBO's. Took me a while to figure out why my VBO code ran fine on my iPhone but was forked up on the iPhone Simulator.

dlpasco said...

warmi,
I'm still not able to verify the results you are seeing, although I did get one run in which there appeared to be improved performance using VBOs, I was unable to reproduce the same result again, neither have any of my coworkers when they try to run the code on their own 3GS's.

I'd like to compare results with you and see what the disconnect is. I agree that the Command Buffer Submitted Bytes drops dramatically when using VBOs, but I haven't seen any change in the bottom line - FPS.

-Daniel
http://twitter.com/dlpasco

warmi said...

I am currently playing with my test trying to understand what is going on

So far:

1. Scaling my models by 0.2 didn't change anything , which is what I was expecting – I am certainly not fill-rate bound.
2. This is interesting … I kicked of my test with Instrument/Activity monitor and I am seeing huge difference in CPU % between two tests Essentially the VBO based test is staying around 8-9% while the non-VBO test is consistently burning up my CPU at 75-80%.
3. Repeating my tests with Instrument/CPU Sample reveals a lot of time being spent within the glDrawElements method when not using VBOs. On the other hand the VBO based test doesn’t register anything specific.

http://www.warmi.net/tmp/vbo_trace.png ( VBO )

http://www.warmi.net/tmp/ca_trace.png (CA)

The models are exported from Lightwave using a custom exporter (which in turn internally uses Nvidia’s NvTriStrip library to optimize the data )
I am rendering 4 batches – 3 models ( containing 3 soldiers each) + a background quad altogether around 136 K worth of vertices (using GL_TRIANGLES) There is a single point light affecting everything with the exception of the background quad.

Each model is submitting an interleaved VBO composed of two vertex streams:
GL_VERTEX_ARRAY
GL_NORMAL_ARRAY

As you can see the test is designed to stress the vertex/VBO summission pipeline more than anything.

Not knowing what kind test you are running, I can only guess but my personal opinion is that the difference we are seeing is the result of your test being bound by some other limit (fill rate perhaps ?)

Rod said...

The main point of VBOs is for static geometry he was using dynamic with GL_DYNAMIC_DRAW apple recommends it for things that don't change often. It also is a GPU download optimization so I don't think it would improve frame rate until the CPU starts slowing stuff down.

Edwin said...

scrub m65 kamagra attorney lawyer body scrub field jacket lovegra marijuana attorney injury lawyer

h4ns said...

What youre saying is completely true. I know that everybody must say the same thing, but I just think that you put it in a way that everyone can understand. I also love the images you put in here. They fit so well with what youre trying to say. Im sure youll reach so many people with what youve got to say.

Arsenal vs Huddersfield Town live streaming
Arsenal vs Huddersfield Town live streaming
Wolverhampton Wanderers vs Stoke City Live Streaming
Wolverhampton Wanderers vs Stoke City Live Streaming
Notts County vs Manchester City Live Streaming
Notts County vs Manchester City Live Streaming
Bologna vs AS Roma Live Streaming
Bologna vs AS Roma Live Streaming
Juventus vs Udinese Live Streaming
Juventus vs Udinese Live Streaming
Napoli vs Sampdoria Live Streaming
Napoli vs Sampdoria Live Streaming
Fulham vs Tottenham Hotspur Live Streaming
Fulham vs Tottenham Hotspur Live Streaming
AS Monaco vs Marseille Live Streaming
AS Monaco vs Marseille Live Streaming
Alajuelense vs Perez Zeledon Live Streaming
Alajuelense vs Perez Zeledon Live Streaming
Technology News | News Today | Live Streaming TV Channels