Statistics of judging results
I think it's interesting to look at the results, see which entries are outliers, and maybe get some hints for the future. I thought I'd share a little bit.
Scores in all three categories are highly correlated, but this is most true for fun/production. Here's a table of correlation coefficients:
Ind Team All Inn/Fun 0.50 0.59 0.55 Pro/Fun 0.83 0.76 0.77 Pro/Inn 0.54 0.77 0.65
Here are graphs of the three pairs of categories.
And here's a question I was curious about: does being at the top of the listings page mean that more people rate your entry? Looking at the data, absolutely:
If we throw out the outlier cranberry (which shouldn't be in there since it's not actually on the entries page), this has a correlation coefficient of r = -0.64. There's also a medium-sized correlation (r = 0.31) between more judges and higher scores. So you might think that an entry lower in the alphabet has a better chance. However, alphabetical order is not correlated with score (r = 0.01). Still, I wonder if something can be done to encourage judges to spread out their ratings more evenly.
Anyway, just some thoughts. Mostly I'm pleased that PyWeek is large enough to do this sort of analysis in the first place! I uploaded the spreadsheet here if you want it.
(log in to comment) For random ordering it would probably be a simple solution to just do random.seed( hash( user.name ) ); random.shuffle( entries ) before listing them (so the order does not need to be stored). If there is also an ordered list available it may not be as inconvenient as I imagined.
Tee: Here's the graph of number of judges vs. alphabetical ordering of the short team name. (I assumed Unix file ordering, so case-sensitive. I imagine the majority of judges use that.)
The trend is not nearly so evident as when graphed against long file name. Keep in mind that the majority of entries' short and long names start with the same letter. You can actually see two downward trends: first for the uppercase, ending at Yukkuri, and second for the lowercase, starting at aiamsori. A notable outlier is "null", whose long name is "0 != None" and got a large number of judges.
So I don't think order in the torrent makes that much difference, except insomuch as it reflects order on the webpage.
Comments
import os, random
d = os.listdir(".")
random.shuffle(d)
open("judge-order","w").writelines("\n".join(d))
I'd also appreciate these randomization ideas.
cyhawk on 2009/05/17 07:16:
Ordering the games randomly (in a different order for every user) would solve this issue, but it would be somewhat inconvenient for the judges. Alternatively you could receive some benefit from judging all entries. I know I felt like a hero when I finished judging the last entry :). Giving extra score to the judge's game would be a bit too much I think, but something little, like an official award or a small icon by the user name, may just be enough incentive to get more people to judge all games.