Online Highscores

Not really anything to do with the competition but i was wondering if anyone had ideas on how to implement an online high-scores system in a piece of open source software. I mean the implimentation would be easy enough but how do you make sure that the scores are valid and haven't been submitted though a hacked version of the code.

Im interested to know if anyone has some good ideas on how to do it and secondly if i try making one over the next few days can i get you people to try and use it to submit an invalid score.

Then if you do mange to then you can tell me what you did and i can try and solve that hole.

(log in to comment)

Comments

Ooh, that's a good question. It's like achievements, which we were talking about on IRC the other day.

You can do some kinds of source trickery to obscure how scores are submitted, but this only makes it harder, not impossible. For example, monkey-patching the standard library in subtle ways in obscure places would make it very difficult to track down why you can't submit scores. You can use the fact that though people have complete access to the game source they have incomplete knowledge about the highscores API.

But you're safer to assume that people can submit dodgy scores and ensure that you take enough information to spot nefarious behaviour. So log everything that goes over the API including the client IP so that you can scrutinise it all later. I would also require as much information as possible - up to and including full replays of the high-scoring games. Spotting cheats is then just a case of plotting graphs and spotting clear outliers. The more facts you take, and can verify, the better equipped you are to be able to spot cheating. Some facts are quite easy to assess, like best score versus total number of scores submitted under the same user id, or average rate of scoring (or the average factor at which score grows, if your game has more of an exponential scoring curve). These pieces of information can be quite hard to fabricate convincingly. 

Complete replays are more difficult to assess, as you need to check that the constraints of the game are never violated, and even then you might want to check for robotic behaviour (a series of replays submitted by the same IP which all take the same actions at the exact same moments in the game, for example).
I'd certainly enjoy having some sort of online-highscores system already built and ready for next Pyweek, or for any game. If you get it written I'd be willing to try it out and test it.
Complete replays are more difficult to assess, as you need to check that the constraints of the game are never violated, 

I'm not sure if this is what you're saying, but submitting complete replays seems like an excellent way to make fake high scores much more difficult to submit. It'd probably be about as difficult as with a closed-source game, ie, still not impossible.

Record the random number seed at the beginning of the game. At each frame, record the time and all the user input (keys, mouse position). These things together should be enough to reconstruct the entire game, and on the server side you can verify that this generates the high score they say it does. Yes this is more data you're transmitting, but based on my rough calculations, if you do it intelligently, you can get it down to around 12kb per minute of gameplay.
Record the random number seed at the beginning of the game. At each frame, record the time and all the user input (keys, mouse position). These things together should be enough to reconstruct the entire game, and on the server side you can verify that this generates the high score they say it does. Yes this is more data you're transmitting, but based on my rough calculations, if you do it intelligently, you can get it down to around 12kb per minute of gameplay.

We tried this last time around with Intestinvaders! - it's really really hard to get it to work right. The number of possible sources of non-determinism is huge, and it's very difficult to be sure you've got rid of them all. You can fix your frame rate, seed your RNG and record all your input variables, but simple things like the ordering of keys in a dict can still change the behaviour of your game hugely, and that's before you get into the messes that are 32/64-bit inconsistencies and floating point rounding modes.

If you do try and do this, my recommendation would be to have heavy testing on multiple platforms at every stage.
Excellent point. I can easily see dict key ordering and floating-point issues causing really subtle differences that become important.

In theory, these things could be important even if you're not sharing game replays: they could make a jump possible on one system and impossible on another system. In practice, that's not likely to happen, but it's probably something everyone should be aware of if you're designing something that requires really delicate timings.
Not to mention the architecture problem. Now not only does your game need to run on the client, it needs to run on the server.

Also think about the main loop of a game - you get that update() main loop with a time delta and use that in your physics to make the game run smoothly. You'd have to synchronize that on the server too.
I wouldn't use a random number generator to control any pertinent aspect of a game with online highscores.
I wouldn't use a random number generator to control any pertinent aspect of a game with online highscores.
Actually, we found this to be one of the easier parts to pin down - calls to the RNG are pretty nicely signposted after all. It's the more subtle non-determinisms which make you tear your hair out.
No, I mean, if you're competing on score you should expect to be playing the exact same content, so everything in the game should be deterministic. I shouldn't be able to get a better score just because I kept hitting lucky breaks that weren't offered to you. I suppose you could control things with an PRNG if you always seed it to the same value.
Ah, yes, I see what you mean now. That's exactly what we did in Intestinvaders! - each track is (pseudo)randomly generated based on a seed determined by its name. Once you're in game, everything is deterministic (or should be - I think there's still a bug or two).
Not to mention the architecture problem. Now not only does your game need to run on the client, it needs to run on the server.

Hardly. None of the graphics or sound or keyboard/mouse code or anything fancy has to run on the server. The only thing that has to run is your game loop and physics engine.
 
Also think about the main loop of a game - you get that update() main loop with a time delta and use that in your physics to make the game run smoothly. You'd have to synchronize that on the server too.


It doesn't have to run in real-time on the server. Like I said, save the time at each frame so you know the delta-t and use that to override whatever dt you're getting from the game loop. Synchronization's not really an issue: it could run slower or (more likely) faster and still work fine.
I think there is merit in using the frailty of the conditions under which the game behaves deterministically to verify that the recording is genuine, but there are all kinds of drawbacks. For example, you can do no less than fire up an instance of the world and run through all of an entire game to check one score submission, which is potentially memory and CPU intensive. I'd prefer an architecture where basic sanity checks are cheap and more extensive verification is applied only where necessary.

Also replays encoded like this would get invalidated by game updates much more frequently than replays stored in a less fragile format. For example, you add a new power up. Using the determinism model, you've potentially taken one more bit of entropy from the RNG in choosing which power-up to spawn. If you just store the type of power up that was spawned and when, your replay is still valid in the updated game.
Also replays encoded like this would get invalidated by game updates much more frequently than replays stored in a less fragile format.

That's true but I don't think it's a problem. Any such update should invalidate high scores made on previous versions. You really need to blank out the leaderboard with every update, no matter how you're verifying the scores.
Maybe. There are obviously changes that will invalidate previous replays and scores, but I think it should be at the developer's discretion for each update. If an update is in the direction of allowing higher scores then existing high-scores are still fine, because scores just become easier to beat. The positive is you don't have to annoy players by wiping their scores. And I carefully constructed the power-ups example. Adding powerups should not necessarily invalidate the old replays - but it depends whether they are intended for players to learn from and brag with or to directly compete against.
Fair enough, it's your call, however, a wise person once said:

if you're competing on score you should expect to be playing the exact same content,

I'm just saying. :)
I don't think that's incompatible. There I was talking about deterministic versus stochastic events in the game, to give a level playing field for players competing on score. Each game update may change the pitch, but it can still offer a level playing field.
If you just store the type of power up that was spawned and when, your replay is still valid in the updated game.

But then you're vulnerable to fake replays that spawn powerups when the game wouldn't normally have done so.

The only scheme that's anywhere near foolproof is to recalculate everything from player inputs. Even then you may have a problem with bots that play superhumanly well.
This problem seems to be a very large one i will try writing up something very basic latter to day if i get a chance during a break from doing D&T im not trying to make something that makes it impossible to hack, only something that makes it more difficult, still does anyone have any ideas on how to do it when the user potentially has the ability to run from source code and mess with scores
Full reproducibility is the best way to validate high scores. There might be some mileage in obfuscation as a disincentive to attempt cheating but it'd never be enough in an open source project.

I don't see why you'd ever throw high score data out. You should just version it. If people want to see high scores set on patch X or later let them do that. Then there's no need for score invalidation to occur at all. The community of players will decide what changes will matter to them.

That's how I'd do it. Fullly reproducible and accurately versioned high score data. Of course such versioning is awkward to make compatible with making the replay data accessible. That'd need some nifty versioning architecture in the code as well.
gcewing said But then you're vulnerable to fake replays that spawn powerups when the game wouldn't normally have done so.

You don't have to trust these data. If your powerups are non-random, as I was advising, then you can check that they spawn at the correct times for the current version of the game. If they are random then perhaps you should use a separate RNG for powerups and save that seed. But you need this information in your replays in case your power-up spawning changes.

Chard said There might be some mileage in obfuscation as a disincentive to attempt cheating but it'd never be enough in an open source project.

Subtle bugs do lie around in open-source projects for years.  I think with misdirection you could at least deflect some crude attacks. Core developers who spot shenanigans might be persuaded to keep the secret.

Chard said You should just version it. If people want to see high scores set on patch X or later let them do that. Then there's no need for score invalidation to occur at all.

I think having a different high-score table for each version is a little crude and confusing, and doesn't really solve the problem, because the latest high score table is basically the real high score table, and the previous ones are historical relics. So you're still effectively throwing out scores.
the latest high score table is basically the real high score table, and the previous ones are historical relics. So you're still effectively throwing out scores.

If you consider that to be true, then merely releasing an updated version of the game would be "throwing out high scores".

It's a historical fact ("relic" if you want to use that term) that certain people got certain scores using version N of the game, and those people would probably like that information to be preserved. It's also an unavoidable fact that scores obtained from version N are not directly comparable with those from version N+1. So keeping them separate seems like a sensible thing to do.

The old high score tables need not be completely dead. If you retain the ability to check replays for earlier versions (by keeping all previous versions of the code or otherwise), then someone who's using version N can still submit scores and have them recorded in the appropriate table.
Releasing a new version is deprecating high scores, certainly, but it isn't deleting them or moving them onto some backwater high-score table.

It's also an unavoidable fact that scores obtained from version N are not directly comparable with those from version N+1.

Agreed, but they may be practically comparable, in which case it penalises players to throw them out, or virtually incomparable, when it may penalise players to keep them in.

In my view there's just one high-score table that players are competing to get on, and the path that developers should take is the one of least penalty to players who have striven to get onto it.
Also just to not to anybody that was waiting i don't think im going to have time to work on this because i have some major projects to work on for school.