Warming up: motion demo

I'm very interested in reducing load times for Pyweek games. I like to jump right into the action. So I tried an experiment. I have a 3-D model with 60 frames that takes about 20 seconds to load into pyOpenGL. So my experiment was to just load 4 frames of the animation to start with, and then add the rest as you go along. I'm guessing you shouldn't generate a GL list in one thread while rendering the scene in another, so all the loading has to take place between game frames. Each animation frame, though, takes about 300ms to generate into a GL list, and that's way too long to stick into a single game frame. But I found out that you can chain together a sequence of GL lists, and it works just as well. So each animation frame comprises 20 GL lists, and each game frame, one of these lists gets generated. I'm still not very familiar with GL optimizations, though, so this might not be the way to go.

I'm pretty satisfied with the result. The framerate and load times are about the same whether you preload it or load it during the action. It starts off pretty jittery as you'd expect, but I think it's a good tradeoff. For an actual game, it would be better, because you'll have more than one action. No need to load the shooting frames before the game begins, if you'll have plenty of time before the player even gets a gun.

Anyway, just thought I'd share. Anyone with more GL experience is welcome to give me tips! :)

motion demo

(log in to comment)

Comments

The real goal here is to minimise the number of GL calls you're making (calls across the Python-C bridge are always slow). Display lists are a good optimisation here, and even when they're not the fastest thing for the video card, they can be a net win because they only require a single GL call to use.

However, when you're compiling your GL lists, you're using bucketloads of GL calls. The OBJ loading code which you're using is based around immediate mode (glVertex3f and friends). This means multiple GL calls per vertex, which is going to slow things to a crawl for geometry of any level of sophistication. The secret is to make use of GL's vertex array functionality - I'd suggest having a look at glDrawArrays and related functions.
I must concur with Martin - glDrawArrays is the way to do it. Latest OpenGL spec even goes as far as declaring glVertex and their kin obsolete in favor of arrays. It requires more calls from a python program to draw a single model, but peforms less real OpenGL operations, which can possibly yield better performance with complex models. And, which is more relevant to your question, you can pickle these arrays, which allows blazing fast load times.

You can find an example of draw arrays and caching in an old revision of md2loader from the flightgame.
Well, technically glDrawArrays is deprecated too in 3.1, and we're supposed to be doing everything with VBOs, but I can't see that change happening any time soon. Pickling the arrays is of course an excellent idea, and one I should have mentioned - if you're going to do precomputation, you might as well do all the precomputation you can.
Cool, thanks! I tried replacing each glBegin(GL_POLYGON) block call with a glDrawArrays(GL_POLYGON...) call, and it sped the GL list compilation up by a factor of 2. That's a great improvement, but it still seems slow enough to worry about.

I reckon you could optimize this greatly by triangulating all the faces, and then a series of faces with the same material could be combined into a single glDrawArrays(GL_TRIANGLES...) call. This could potentially make the entire GL list compilation almost instantaneous. I'll try that by exporting the model from Blender triangulated.

Also, I'm still working with normal Python lists here. I'm going to try numpy arrays, but I bet it won't make a noticable difference. There's very little difference between reading straight from OBJ/MTL files, versus parsing these files and pickling the resulting objects, like I did with the demo above. I think Martin's right, that the bottleneck by an enormous margin is the number of GL calls. If that's the case, it's really simple once you understand that - it's too bad most of the OpenGL material online isn't written with Python in mind.
I think you'll find that once you're doing a single glDrawArrays per material, it becomes really worthwhile to pre-parse the OBJ and store just the relevant lists. As it is, parsing doesn't seem slow only because you're doing a draw call (and several state-setting calls) per face, and that's dominating your run-time.

I have been meaning to write something about OpenGL strategies for Python - as you say, most of the material online is focused on the API itself, and it's often not clear what advice transfers over and what doesn't.
<p>So I tried it out, and you're absolutely right. I thought I'd post my results somewhere. Is there a better message board for OpenGL under Python? Anyway, here's the strategies I tried:
<ul>
<li>OBJ0: Extremely naive baseline. Doesn't even use display lists, just renders all the polygons on demand with one polygon render per face. Every other strategy uses a display list.
<li>OBJ1: The strategy used in the objloader module from the Pygame wiki. One polygon render per face.
<li>OBJ2: Use one glDrawArrays call per face, updating the vertex pointer each time.
<li>OBJ3: One glDrawArrays call per material. Assumes everything is triangulated.
<li>OBJ4: Same as OBJ3 but uses numpy arrays instead of python lists.
<li>OBJ5: Uses one glDrawElements call per material, using a list of indices.
<li>OBJ6: Same as OBJ5 but uses numpy arrays instead of python lists.
</ul>
<p>I wanted to try one with VBOs as well, but I haven't figured them out yet. For each strategy I timed four steps: loading the objects from the OBJ files, pickling the objects, reloading the objects from a pickle file, and generating the display lists. Probably the sum of the reloading and generating is the important number, because that's what you'd do to start a game. Then I also measured the FPS of rendering a bunch of sprites:
<pre>       load  save reload     gen  fps
OBJ0  11447  8346   6261       0    0.026
OBJ1  11223  8330   5974  171049    4.673
OBJ2  11725  9405   6243  100975    4.651
OBJ3  15445  2569   4550   20880  302.626
OBJ4  15609   264    115     453  309.804
OBJ5  16438   677   1237    6825  337.610
OBJ6  15949   138     71     372  305.214</pre>
<p>There's a surprisingly large difference in framerate when the number of GL calls within the display list is minimized. I expected the gen-time to improve, but not necessarily the framerate. Using numpy arrays made a huge improvement in reload+gen time, and using and index list and DrawElements over DrawArrays made a smaller improvement.
Couldn't we used to put HTML in our comments? Oh well, you all know what I mean.
Nice work! I'd like to see numbers for ctypes arrays as well as numpy - they should be similar in speed, but without the numpy dependency.
That's a great idea. I can tell that it does perform well for generating, but I'm having trouble with the pickling part. When I define a ctypes array, it won't let me pickle it because the type isn't known:

>>> import ctypes, cPickle
>>> a = [1, 2, 3]
>>> b = (ctypes.c_int * 3)(*a)
>>> cPickle.dump(b, open("b.pickle", "wb"), 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
cPickle.PicklingError: Can't pickle <class 'ctypes._endian.c_long_Array_3'>: attribute lookup ctypes._endian.c_long_Array_3 failed

Is there a way to make this work? You can pickle it as a regular list and only convert it to ctypes after unpickling it. This does offer some improvement over normal python lists, but not nearly as much as numpy arrays do, since they can be pickled.
Oh, that's irritating. I just assumed that ctypes arrays were easily picklable, but that obviously isn't the case. Converting to and from Python lists is undoubtedly quite slow, so it's not ideal. Maybe try strings? Something like this:

 s = ctypes.string_at(a, ctypes.sizeof(a))

followed by this once you've unpickled:

 a = (ctypes.c_int * (len(s) // ctypes.sizeof(ctypes.c_int))).from_buffer_copy(s)


Also, post your code! I'm sure this would be very useful for many people - I've seen lots of people use the Pygame cookbook OBJ loader in Pyweek entries, and this is obviously a vast improvement.
Yeah, that works. Thanks for the tip! Okay, I added two new strategies:
  • OBJ7: same as OBJ3 (glDrawArrays) but using ctypes arrays
  • OBJ8: same as OBJ5 (glDrawElements) but using ctypes arrays
Turns out it works great! Here's the updated list:

       load  save reload     gen  fps

OBJ0  11447  8346   6261       0    0.026
OBJ1  11223  8330   5974  171049    4.673
OBJ2  11725  9405   6243  100975    4.651
OBJ3  15445  2569   4550   20880  302.626
OBJ4  15609   264    115     453  309.804
OBJ5  16438   677   1237    6825  337.610
OBJ6  15949   138     71     372  305.214
OBJ7  18296   132     50     395  297.724
OBJ8  20395    74     35     337  315.322

Here's my code:
just objloader (12k)
code + data for testing (5.5M)

Unfortunately, it's not generally useful at this point. I stripped it down to essentials in order to run the test, and there's a lot it doesn't handle, especially textures and faces other than triangles.
I also tried the built-in array module, but it didn't work. When I go to use an array instance in a GL call, I get this:

TypeError: ("No array-type handler for type <type 'array.array'> (value: array('f', [0.68119502067565918, -3.59242892265319) registered", <OpenGL.arrays.arrayhelpers.AsArrayOfType object at 0x98f7c0c>)

I'm getting very confused looking up info on this module. Many webpages are not clear about whether they're referring to this module, or numpy arrays, or both. So I have no idea if it's possible to use these or not....
What's this about the pickling the vertex arrays? That's the only part of this discussion I don't understand at all. I have been using ctypes arrays. Why do you want to pickle them? Thanks!
Ah! Suddenly on re-reading through, I get it. You are re-saving the vertex arrays to disk using pickle, so that they can later be loaded quickly. Alright, I'm happy.