Warming up: motion demo
I'm very interested in reducing load times for Pyweek games. I like to jump right into the action. So I tried an experiment. I have a 3-D model with 60 frames that takes about 20 seconds to load into pyOpenGL. So my experiment was to just load 4 frames of the animation to start with, and then add the rest as you go along. I'm guessing you shouldn't generate a GL list in one thread while rendering the scene in another, so all the loading has to take place between game frames. Each animation frame, though, takes about 300ms to generate into a GL list, and that's way too long to stick into a single game frame. But I found out that you can chain together a sequence of GL lists, and it works just as well. So each animation frame comprises 20 GL lists, and each game frame, one of these lists gets generated. I'm still not very familiar with GL optimizations, though, so this might not be the way to go.I'm pretty satisfied with the result. The framerate and load times are about the same whether you preload it or load it during the action. It starts off pretty jittery as you'd expect, but I think it's a good tradeoff. For an actual game, it would be better, because you'll have more than one action. No need to load the shooting frames before the game begins, if you'll have plenty of time before the player even gets a gun.
Anyway, just thought I'd share. Anyone with more GL experience is welcome to give me tips! :)
motion demo
(log in to comment)
Comments
You can find an example of draw arrays and caching in an old revision of md2loader from the flightgame.
I reckon you could optimize this greatly by triangulating all the faces, and then a series of faces with the same material could be combined into a single glDrawArrays(GL_TRIANGLES...) call. This could potentially make the entire GL list compilation almost instantaneous. I'll try that by exporting the model from Blender triangulated.
Also, I'm still working with normal Python lists here. I'm going to try numpy arrays, but I bet it won't make a noticable difference. There's very little difference between reading straight from OBJ/MTL files, versus parsing these files and pickling the resulting objects, like I did with the demo above. I think Martin's right, that the bottleneck by an enormous margin is the number of GL calls. If that's the case, it's really simple once you understand that - it's too bad most of the OpenGL material online isn't written with Python in mind.
I have been meaning to write something about OpenGL strategies for Python - as you say, most of the material online is focused on the API itself, and it's often not clear what advice transfers over and what doesn't.
<ul>
<li>OBJ0: Extremely naive baseline. Doesn't even use display lists, just renders all the polygons on demand with one polygon render per face. Every other strategy uses a display list.
<li>OBJ1: The strategy used in the objloader module from the Pygame wiki. One polygon render per face.
<li>OBJ2: Use one glDrawArrays call per face, updating the vertex pointer each time.
<li>OBJ3: One glDrawArrays call per material. Assumes everything is triangulated.
<li>OBJ4: Same as OBJ3 but uses numpy arrays instead of python lists.
<li>OBJ5: Uses one glDrawElements call per material, using a list of indices.
<li>OBJ6: Same as OBJ5 but uses numpy arrays instead of python lists.
</ul>
<p>I wanted to try one with VBOs as well, but I haven't figured them out yet. For each strategy I timed four steps: loading the objects from the OBJ files, pickling the objects, reloading the objects from a pickle file, and generating the display lists. Probably the sum of the reloading and generating is the important number, because that's what you'd do to start a game. Then I also measured the FPS of rendering a bunch of sprites:
<pre> load save reload gen fps
OBJ0 11447 8346 6261 0 0.026
OBJ1 11223 8330 5974 171049 4.673
OBJ2 11725 9405 6243 100975 4.651
OBJ3 15445 2569 4550 20880 302.626
OBJ4 15609 264 115 453 309.804
OBJ5 16438 677 1237 6825 337.610
OBJ6 15949 138 71 372 305.214</pre>
<p>There's a surprisingly large difference in framerate when the number of GL calls within the display list is minimized. I expected the gen-time to improve, but not necessarily the framerate. Using numpy arrays made a huge improvement in reload+gen time, and using and index list and DrawElements over DrawArrays made a smaller improvement.
>>> import ctypes, cPickle
>>> a = [1, 2, 3]
>>> b = (ctypes.c_int * 3)(*a)
>>> cPickle.dump(b, open("b.pickle", "wb"), 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
cPickle.PicklingError: Can't pickle <class 'ctypes._endian.c_long_Array_3'>: attribute lookup ctypes._endian.c_long_Array_3 failed
Is there a way to make this work? You can pickle it as a regular list and only convert it to ctypes after unpickling it. This does offer some improvement over normal python lists, but not nearly as much as numpy arrays do, since they can be pickled.
s = ctypes.string_at(a, ctypes.sizeof(a))
followed by this once you've unpickled:
a = (ctypes.c_int * (len(s) // ctypes.sizeof(ctypes.c_int))).from_buffer_copy(s)
Also, post your code! I'm sure this would be very useful for many people - I've seen lots of people use the Pygame cookbook OBJ loader in Pyweek entries, and this is obviously a vast improvement.
- OBJ7: same as OBJ3 (glDrawArrays) but using ctypes arrays
- OBJ8: same as OBJ5 (glDrawElements) but using ctypes arrays
load save reload gen fps
OBJ0 11447 8346 6261 0 0.026
OBJ1 11223 8330 5974 171049 4.673
OBJ2 11725 9405 6243 100975 4.651
OBJ3 15445 2569 4550 20880 302.626
OBJ4 15609 264 115 453 309.804
OBJ5 16438 677 1237 6825 337.610
OBJ6 15949 138 71 372 305.214
OBJ7 18296 132 50 395 297.724
OBJ8 20395 74 35 337 315.322
Here's my code:
just objloader (12k)
code + data for testing (5.5M)
Unfortunately, it's not generally useful at this point. I stripped it down to essentials in order to run the test, and there's a lot it doesn't handle, especially textures and faces other than triangles.
TypeError: ("No array-type handler for type <type 'array.array'> (value: array('f', [0.68119502067565918, -3.59242892265319) registered", <OpenGL.arrays.arrayhelpers.AsArrayOfType object at 0x98f7c0c>)
I'm getting very confused looking up info on this module. Many webpages are not clear about whether they're referring to this module, or numpy arrays, or both. So I have no idea if it's possible to use these or not....
Martin on 2010/03/07 01:51:
The real goal here is to minimise the number of GL calls you're making (calls across the Python-C bridge are always slow). Display lists are a good optimisation here, and even when they're not the fastest thing for the video card, they can be a net win because they only require a single GL call to use.However, when you're compiling your GL lists, you're using bucketloads of GL calls. The OBJ loading code which you're using is based around immediate mode (glVertex3f and friends). This means multiple GL calls per vertex, which is going to slow things to a crawl for geometry of any level of sophistication. The secret is to make use of GL's vertex array functionality - I'd suggest having a look at glDrawArrays and related functions.