Statistics of judging results

I think it's interesting to look at the results, see which entries are outliers, and maybe get some hints for the future. I thought I'd share a little bit.

Scores in all three categories are highly correlated, but this is most true for fun/production. Here's a table of correlation coefficients:

         Ind   Team  All
Inn/Fun  0.50  0.59  0.55
Pro/Fun  0.83  0.76  0.77
Pro/Inn  0.54  0.77  0.65

Here are graphs of the three pairs of categories.

And here's a question I was curious about: does being at the top of the listings page mean that more people rate your entry? Looking at the data, absolutely:

If we throw out the outlier cranberry (which shouldn't be in there since it's not actually on the entries page), this has a correlation coefficient of r = -0.64. There's also a medium-sized correlation (r = 0.31) between more judges and higher scores. So you might think that an entry lower in the alphabet has a better chance. However, alphabetical order is not correlated with score (r = 0.01). Still, I wonder if something can be done to encourage judges to spread out their ratings more evenly.

Anyway, just some thoughts. Mostly I'm pleased that PyWeek is large enough to do this sort of analysis in the first place! I uploaded the spreadsheet here if you want it.

(log in to comment)

Comments

Ordering the games randomly (in a different order for every user) would solve this issue, but it would be somewhat inconvenient for the judges. Alternatively you could receive some benefit from judging all entries. I know I felt like a hero when I finished judging the last entry :). Giving extra score to the judge's game would be a bit too much I think, but something little, like an official award or a small icon by the user name, may just be enough incentive to get more people to judge all games.
Taking a leaf out of the IF Comp's book, one could provide each logged-in user of the site with a randomly-ordered list of which games to play first (available in addition to the regular list, in case you still want to find a particular game), which remains the same for the duration of the judging period. Or simply have a prominent button on the entries list which takes you to the page for a random entry you haven't rated yet. Since the entries page is usually quite a visual clutter, it's hard otherwise to find a methodical way to play through the games. And, um, what's the story with cranberry?
@adam: cranberry explicitly said, don't bother judging, and did not have an upload marked as final, so it looks like I was the only one, who bothered :).

For random ordering it would probably be a simple solution to just do random.seed( hash( user.name ) ); random.shuffle( entries ) before listing them (so the order does not need to be stored). If there is also an ordered list available it may not be as inconvenient as I imagined.

That sounds like a great modification - also maybe it could hide entries that I have already rated so that I didnt have to look through 40 rated entries for the 10 I havent.
Let me just point out that this has been mentioned before, and at the time richard seemed to indicate that it wasn't going to be added soon. Maybe it will, but if not, is there anything we can do without modifying the website? The torrent could at least contain a randomizing script, I think. I used something like this:
import os, random
d = os.listdir(".")
random.shuffle(d)
open("judge-order","w").writelines("\n".join(d))
A Greasemonkey script would also be doable!
Thanks for the graphics! They're very interesting to look at. Can you also make a graph "number of judges" vs "alphabetical order of short entry name"? That's the directory order in the torrent.

I'd also appreciate these randomization ideas.
Just put all the entries you've rated in order at the bottom. Shuffle the ones you haven't rated.

Tee: Here's the graph of number of judges vs. alphabetical ordering of the short team name. (I assumed Unix file ordering, so case-sensitive. I imagine the majority of judges use that.)

The trend is not nearly so evident as when graphed against long file name. Keep in mind that the majority of entries' short and long names start with the same letter. You can actually see two downward trends: first for the uppercase, ending at Yukkuri, and second for the lowercase, starting at aiamsori. A notable outlier is "null", whose long name is "0 != None" and got a large number of judges.

So I don't think order in the torrent makes that much difference, except insomuch as it reflects order on the webpage.

Thanks. Interesting. I use Linux, which is case-sensitive, but ls doesn't list them in case-sensitive. Windows also doesn't list its directories in case-sensitive. Also, I notice the graph tends to go back up when you go to lowercase. So it might be worth it to look at the non-case-sensitive case (I'd be interested, if you don't mind the work). But, yes, I can see that the alphabetical ordering of long names have a stronger role in number of entries.