Postmortem - Bamboo Warrior

mauve on 2010/04/18 10:48
— edited on 2010/04/20 08:17

Thanks to everyone who played my game, except the person who gave me 1 point for fun because they couldn't find either of the two jump keys. :-(

I think the reviews my game got were generally fair, picking up on things that I knew about but will require days more to polish - spikes and pits were on the drawing board (well, wiki) as are the more innovative ideas that I don't want to give away right now. Sound and music were removed mid-way through development because they caused crashes for me.

I've learned a lot from PyWeek and have really enjoyed playing the games. Many of the games I would love to see developed further. Thanks to Richard for organising the competition.

Now, I have some more controversial observations about the PyWeek competition, specifically, the review process. I think the rating system is too narrow. I would like to see a review system that allows the reviewer to choose his own criteria, perhaps on a straight 10-point scale. I personally think innovation should not be prioritised as much as fun - I love innovation, but it's totally irrelevant in a boring or shoddy game. And I like games that demonstrate open-ended potential - I'd like to rate concept, more than innovation. But all that would be up to each individual reviewer.

The other thing is that it seems to be completely impossible for a game to be disqualified, even for breaking what to me is the cardinal rule - adherence to the theme. This could be the first question in the review form: "Does this game tie in the Wibbly-Wobbly theme?" It's such a vital rule because other competitors will be obeying it, developing their second or third-choice idea, and completely disregarding the theme (or constructing the most tenuous of tie-ins) gives an un-even playing field. It's nothing less than cheating.

I would love there to be a "people's choice" award where unregistered users can upvote games. Obviously someone could write a bot to upvote their game, but preventing or catching is a different matter to its desirability. I think the people's choice award would give at least some indication of whether the official scores are in line with how non-developers perceive the entries.

What do you think? Discuss.

(log in to comment)

Comments

mauve on 2010/04/18 11:09:

Another idea: do what they do in the Olympics and disregard the lowest and highest score, to avoid the biggest outliers distorting the average.

cyhawk on 2010/04/18 11:22:

The goal of this competition is to have fun making a game, not to win. So you shouldn't be too hung up on the scoring — of course as the quality of our game-making skills varies wildly, so does our ability to score a game meaningfully and you will inevitably disagree with a number of reviewers (especially if you are biased towards your own game like everybody is).

I think this is fine for PyWeek since there are no material rewards. And if you made a standout game you can always give it some final touches and enter it in a more serious competition, like PAX.

stycchio on 2010/04/18 11:35:

http://www.youtube.com/watch?v=WL1lfSzgcAw

ldle on 2010/04/18 17:37:

stycchio, is a troll with nothing to contribute apparently, except spam.

mauve on 2010/04/18 18:24:

cyhawk: I think the goal of the competition can be subjective.

My goal this week was to see what I could achieve in a week, and brush up my real-time games skills having done web-based and turn-based stuff for the past few projects. I'd add that one objective of the competition should be to make great games, and I think a fairer, truer rating system can encourage that. I'm not hung up on it, I just wanted to share my ideas.

The issue of adherence to the theme on the other hand has the potential to make me bloody furious. I don't know why people aren't incensed that (by my least generous criteria) some 10 entries completely ignored the theme.

Tee on 2010/04/18 19:36:

While my goal is more aligned with cyhawk's goal, I also think the rating system is a bit strange, but I'm not sure what's the best way of improving it. Like someone else said another day, the average of your three ratings doesn't always correspond to your overall rating. Maybe we should have the overall rating separate? In a sense, that would represent the reviewer choosing his own criteria like you suggested.

I like the idea of doing something about the outliers, though I'm not sure simply disregarding the worst and best score is a good idea. Maybe someone with a better knowledge of statistics could give us an idea of a measure that doesn't take outliers into account much? Something a bit more like a median, maybe?

About the theme, I was wondering if it would be a good idea if everyone had to at least write somewhere how the theme integrates with the game. Actually, that doesn't bother me much since there aren't many cases anyway and my goal is to just have fun making a game, but I think if there are people who care about others following the theme, this might be a good idea.

Cosmologicon on 2010/04/18 20:29:

mauve: I reserve about half of the Innovation ratings I give for "use of theme". You might try that. It would de-prioritize innovation and prioritize use of theme. I also get extremely disappointed in games where the theme seems tacked on.

However, I very much disagree with you that Doctor When used the theme poorly. On the contrary, I thought it was one of the best uses, precisely because it's not the first thing you think of. I like it when people interpret the theme differently than I do. The complaint that it "comes from someone else's intellectual property" is just bizarre. My idea of a poor use is when the theme only appears in the title or the character's name, and has nothing to do with the game mechanic.

So I definitely want to see all games based on the theme. But if that means "based on what mauve thinks the theme means", I'm not with you.

mauve on 2010/04/18 23:14:

I wasn't putting myself up as a final arbiter of theme relevance, of course, and I'm open to more unusual tie-ins as much as the next person.

I don't see how the theme being mentioned only in a title differs to the theme being mentioned in a passing reference by a character in a TV series two years ago that you've decided to quote . Except that quoting a TV series requires no originality whatsoever and it's several steps more tenuous a connection. If you've not seen the TV show it's no connection at all.

That's what I meant by "someone else's intellectual property" - I wasn't complaining about their infringement of copyright and trademarks (though they do), just that the idea isn't even an original concept, which might count for something.

Turn it around and consider the consequences. If you can use any of the ideas in any TV show just because once-upon-a-time one of the characters mentioned the name of the theme, there is enough recorded TV dialogue to link practically any concept to any theme, and that of course defeats the object of having a theme just as much as being able to disregard the theme.

richard on 2010/04/19 00:36:

Almost every PyWeek I have a look at the rating system and think about how it might be improved. Splitting out the "overall" rating might help, but really it'll just be another number that people will need to fill in. I'm not sure that'll help.

I've also thought about having a simple system where people enter their top (individual and team) game and those are tallied to figure out the popular choice.

The ratings are always going to be a little all over the place for a given entry - everyone's got their own ideas about what "good production" means or whether a particular interpretation of the theme is appropriate. To allow an increased range of voices in the comments I don't think it's a good idea to get too specific about what a "good production" score means.

For the record, my stance is that any interpretation of the theme is allowed. The only reason I'll disqualify someone is for breaking the hard rules, things like: submitting an existing game; using IP that they don't have permission to use; or submitting under a license that's not appropriate.

Since this challenge isn't a Real Competition with prizes and all that jazz then we can be a lot more laid-back about things. This is intentional. The theme exists to inspire first and provide some small amount of rules adherence second (a very distant second at that.)

I'd love to do some data mining to see how the ratings have changed over time. I have this feeling that the scores have generally averaged lower over time. This isn't necessarily that surprising since as game players we've become more sophisticated. On the other hand the tools available to us as game developers are also more sophisticated.

I also need to look at the numbers to see whether the randomisation of the entries page helped even the spread of ratings across all the entries; there's clearly some entries that received much fewer ratings but I believe that could be related to the complexity of getting the game going.

@stycchio if you can't play nicely then please don't play at all :-)

Tee on 2010/04/19 01:08:

Personally, the ideal for me would be a challenge system that doesn't use ratings at all and encourages good feedback. Maybe only one single score to represent your overall opinion about the game, but without the rankings and everything. Maybe highlights in specific categories would be awarded with preset and easy to give awards, so, for example, someone might get 3 silver medals in innovation and 2 bronze medals in innovation, and there would be a nice screen to show how many medals you got in each category. I like this because gives you a more tangible idea of what people liked in your game.

These are not suggestions, I'm just thinking out loud. I know some people like the competition and having ratings and everything.

I also personally dislike the idea of only entering the top games, mostly because that encourages more the sense of competition instead of the motivation to simply create good games (but, again, some people might like it). Oh, and I like the fact how Pyweek is laid-back. It allows me to experiment in games and not worry much about failure. :)

Cosmologicon on 2010/04/19 01:34:

Not that anybody asked me, but the only thing I would change about the rating system is to add more categories. Ludum Dare has Innovation, Fun, Theme, Graphics, Audio, Humor, Overall, and Community. But you'd really have to decide what categories fits all potential games; Audio might not and Humor definitely doesn't.

I might go with Creativity, Use of Theme, Gameplay Mechanic, Replay Value, Artwork, Polish/bugfreeness, and Overall. Actually, that's two sub-categories of Innovation, Fun, and Production each, plus an overall.

I don't want to have to pick my favorite game, because it's hard to pick out of 40 when I haven't played one for almost two weeks. As a judge, I prefer rankings.

gcewing on 2010/04/19 08:00:

I have this feeling that the scores have generally averaged lower over time. This isn't necessarily that surprising since as game players we've become more sophisticated.

My impression is that the quality of the top games has actually been declining recently. When I first started participating in PyWeek, there was usually at least one game that really stood out as being remarkably complete, polished and truly engaging to play for a substantial period of time, but I didn't get that feeling about any of them this time. The one that came closest for me was Stratejelly, but it had enough flaws that it didn't quite get there.

I don't think this is because I've become "more sophisticated" as a game player -- I've been playing games for a lot longer than PyWeek has existed, and it seems unlikely that my general sense of what constitutes an engaging game has been changed much by playing PyWeek games.

Maybe this is just a random statistical fluctuation. Maybe the best entrants of past PyWeeks didn't have as much time for it this time, or maybe they weren't lucky enough to come up with a really good idea. Maybe the theme just wasn't very inspiring this time. I know I had great difficulty coming up with an idea for WW that I could get properly enthusiastic about, and my entry suffered as a result.

On the other hand the tools available to us as game developers are also more sophisticated.

Yes, but I doubt this makes a lot of difference when it comes to that undefinable quality that makes a game stand out. Creating a truly great game requires a rare combination of skills and sheer talent. No tool can make up for the lack of that.

ldle on 2010/04/19 12:59:

My impression is that the quality of the top games has actually been declining recently.

See, this is precisely why I'm annoyed this Pyweek. We specifically aimed to release a polished-to-the-end arcade game, totally finished -- yet the community didn't respond by rewarding us with scores.

I don't think it has *anything* to do with us getting better at playing games.

mauve on 2010/04/19 18:31:

I like Tee's idea of medals. It sounds really positive. Keeping count of the value of medals someone has won would give you an estimate of how much respect someone deserves. Of course it has similarity to the existing awards system, but awards are used more like one-off bonuses.

gcewing: I'd add luck to your rare combination. Sometimes you get lucky and nail some great features, other times everything conspires to thwart you.

bjorn on 2010/04/19 23:23:

Pick your favorite would be rough because for some of us, unfortunately, we just don't get a chance to play and rate all of the games; I know I missed some really good ones this time around, but I'm glad we have the scores so I can have some idea which those are (and which are likely to be most fun especially).

I'm interested in if randomizing the listing helped distribute the rankings more evenly. If you can get a relatively good distribution of ratings for games then it's possible that normalizing ratings from each individual so that their ratings are adjusted to average 3 in all categories would give "better" overall results, but then again maybe not.

My opinion is the current rating system usually gives good enough results to point out what the top games are. The ones I would pick as the best games of the competition don't always win, but they're usually in the top three.

gcewing on 2010/04/20 00:10:

We specifically aimed to release a polished-to-the-end arcade game, totally finished -- yet the community didn't respond by rewarding us with scores.

You can't expect to be rewarded with top scores for what you intended to produce -- you have to actually pull it off.

In the case of Bamboo Warrior, although it was very nice visually, for me it just wasn't substantial enough to warrant a top ranking. Part of that may be because games where fighting is the only thing to do aren't very appealing to me personally, so I tend not to give such games a very high fun score. Another person who does enjoy that kind of game might give it a higher score. So there's luck involved here too -- your taste in games has to happen to coincide with that of your judges.

By the way, I don't intend my comments about the quality of games to be a complaint. Producing a top-quality game in a week is a remarkable feat, and not something you can really expect people to do on a regular basis. The surprising thing is that it happens as often as it does!

Cosmologicon on 2010/04/20 00:43:

I'm interested in if randomizing the listing helped distribute the rankings more evenly.

If you mean the number of rankings, yes, I think it helped distribute them more evenly. Maybe not 100%, but it definitely helped. Here's a lovely chart:

and for comparison, here is a similar chart I made for Pyweek 8:

richard on 2010/04/20 01:14:

That's a lovely chart, thankyou :)

gizmo_thunder on 2010/04/20 08:36:

really nice charts ..

mauve on 2010/04/20 08:45:

In the case of Bamboo Warrior, although it was very nice visually, for me it just wasn't substantial enough to warrant a top ranking.

Yes, I'm aware that it was insubstantial. The gameplay was not where I wanted it to be, partly because I don't have enough frames of animation, which makes the fighting seem very stilted, and partly because the AI isn't very natural, and partly because I didn't have time to add any variety. But I thought it was fun. Hopefully I can nail the gameplay for the Pyggys.

You can't expect to be rewarded with top scores for what you intended to produce -- you have to actually pull it off.

I feel slightly differently - as I mentioned above, if I think a game demonstrates promise I feel more generous when reviewing than if I think this is all the game will ever be. It's one of the reasons I didn't rate Street Performer as highly as I could have done. It feels like it is what it is, and if it is that, it feels like a competent Flash game, and competent Flash games are ten a penny.

Tee on 2010/04/20 12:33:

Cosmologicon: Nice charts. It seems randomizing the entries indeed helped. Can we have the other charts you made for Pyweek 8 (I don't remember what they were exactly, but I think they compared ratings) for this Pyweek and put them side by side, so we can see the difference between the ratings between the two Pyweeks (or even better, overlapped with different colors)?

mauve: I don't get your comparison with Flash games. How is a competent Flash game different from a competent Python game? Most of the 2D games from Pyweek could've been done in Flash.

If the game shows promise, I usually only write something in the comment, and add an extra innovation point if it is the case. I'm not sure if this is what you meant, but anyway, I don't think that a game that shows promise but is incomplete deserves as much as an equal game but complete and polished.

Cosmologicon on 2010/04/20 14:01:

Okey dokey, I'll get right on that chart thingy... soonish.... I should probably write it up as a Python script so that people can graph whatever they want.

I agree that 99% of 2-D Pyweek games strongly resemble Flash games, competent or otherwise. Bamboo Warrior is certainly no exception. The only real exception I can think of is Robot Underground, and that's just because of its scope, not because I particularly like it.

mauve on 2010/04/20 14:26:

Tee: Don't get me wrong, incompleteness definitely detracts. I'm not talking about quality, I'm talking about concept. How much a game inspires me, how ambitious it is, whether it makes me want to contribute to the project. I like strong concepts, perhaps even more than I like innovation.

My point about Flash is that many Flash games have a fairly narrow scope because because Flash carries loads of limitations that Python doesn't. Flash games tend to end up attractive and polished, but simple.

Tee on 2010/04/20 15:48:

mauve: I understand, I feel the same about strong concepts. I was just saying that completeness counts, too.

As for the Flash topic, I know what you mean, but I think that feeling is a bit misleading probably because there are loads of Flash games out there and you end up seeing a lot that are very simple. But I've played a lot of gems implemented in Flash that have very good concepts and are well executed (and not necessarily attractive). The limitations of Flash are mostly technical, not design-wise, so it shouldn't interfere much with the concept, which is more limited by the mind of the designer and effort of the developers to execute it well rather than the technology used. The only relevant technical limitation (relevant in terms of being able to execute good concepts, for 2D games) is that the size has to be kept reasonable, which ends up limiting the scope of the game like you said - still, to overcome that limitation, some end up making them downloadable and even selling them (for example, N+, VVVVVV). But I understand what you mean, you're thinking of the "middle blob" of Flash games that puts effort in mostly "production" without much depth in gameplay.

richard on 2010/04/20 21:55:

Cosmo: if you want access to the raw numbers from the db to do more fun analysis just let me know.