2015/07/26

The Game State Machine

This post is about lots of things: It's about words with so many meanings they become meaningless. It's about what code runs every frame in the game. It's about one reason I don't use Unity
Ultimately, it boils down to flow control. I'm not talking about conditional jumps and loops, I'm talking about transitions between conceptually and logically distinct phases of the game.

Of Game States, Screens & Scenes

One of the problems in discussing this, like much else, is language. All of the terms above are somewhat overloaded, but the worst offender, I think, is game state. Most people you might ask will have some idea of what it means, but there is lots ideas to be had:
  • Mathematicians and game designers might think of game state in terms of rules: every valid arrangement of chess pieces on a board is a game state of chess, and different actions by the players will transform the game state;
  • A computer scientist might think of video games as programs, so the game state should refer to the program state of the game, i.e. all the information stored in its memory;
  • Some programmers, including the XNA developers, seem to think of phases of the game's execution, where the game behaves differently.
You could model your engine after that first definition, but you would probably end up with an engine that can only deal with a small set of turn based games. Modeling the engine after the second definition doesn't really make sense: Managing the program state is the job of the program, no matter how you model it.
The third definition, I find, makes the most sense in the context of programming. Modeling your engine around this requires you to turn the game logic subsystem into a state machine, hence the name game state.

Looking at the Ruby source code of the RPG Maker engine, you'll find that it works just like that, only that it calls the states Scenes. Sounds familiar?
Unity is all about scenes, but they use a different definition, essentially equating them with sections of the game world. Unity is not modeled around game states at all, you use a scene to represent your title screen, another to represent your menu screen, one for World 1-1, another for World 1-2. It assumes that the conceptual difference between the world map and a level accessed through that map is the same as the difference between any two levels. I find this one-size-fits-all solution to be very annoying to work with, so I don't.
To get back on topic, Unity's definition of a scene is similar to that used in 3D modeling: A scene refers to a collection of objects and their relative positions. In fact, it's also similar to the definition used in theatre: There's a description of what's on stage (background, actors) and what happens (the script). If the actors or the background change you go to the next scene.

If you look at the terms I used here, title screen and menu screen among them, you might ask why I don't call states screens and what I would define a screen as. Among the three, it is for sure the least used term among developers, but it is the most common among players: they talk about inventory screens, battle screens, status screens, loading screens...
In my mind, screens are substates of game states, with some acknowledgment of difference in content as well. For example, in the inventory state, you might come across an item list screen, an item description screen, and, in some RPGs, a party screen ("Who do you want to use this item?").

Managing Game State

So if it's not clear already, I want my engine to be aware of game state. So now we need to make a plan as to how to actually implement it.
Just to give you an idea of two extremes I've seen in practice: the Pokémon games I used to mod literally had an integer variable and a jump table as state management. Compare that to the RPG Maker's SceneManager, which uses a fully object oriented model of states, and a stack to keep track of what states came before the current one.

According to automata theory, there's a couple of sensible models:
  • The simplest construct is the finite state machine. Therein, there is only the current state, and the next state is solely determined by the current state and the input it received. However, since we're working in a Turing complete language, the logic to determine the next state can be arbitrarily complex. If one of your states wants to return to the previous states, it can keep track of the previous state itself, while the manager still operates like a FSM.
  • A pushdown automaton has that logic baked in: Formally, it extends the finite state machine with a stack, shared by all states, that can be filled with arbitrary data. Transitions may depend on current state, input and the top of the stack. In practice, since we have RAM, which is shared by all states anyway, we'd really only push new states in and pop them off when we want to return to the previous one, considering the top of the stack as the current state.
  • In a hierarchical state machine, a state may define substates (the current substate being on top of the stack), which extend the logic of the superstate, and cascade down the stack. As in the linked example, you could implement that with inheritance as well.
We could go on like this for quite a while, all the way up to Turing machines, but I guess this is where I draw the line. We're already working in a Turing complete language, so why build a Turing complete state manager? Telling it what to do would be equivalent to just hacking in the feature you want, except you wouldn't be used to the semantics, and the code running in the background wouldn't be battle tested.
A finite state machine, unintuitively suffers from the same problem: It's feature set is so minimalistic, you'd be hacking in just as much, except this time, you're doing it in a familiar language. Better, but still bad.
Both the pushdown automaton and the hierarchical state machine share a desirable property: they facilitate code reuse to simplify your code. We will use a stack based hierarchical state machine, as it is essentially a pushdown automaton with random access to the previous state; having a Previous property on a GameState is a very low cost compared to the additional power it provides.

Implementation Details

Now that we've decided on the theoretical construct behind our manager, we need to consider all the nitty gritty details; looking back at the two extremes, they don't only display different theoretical models, they also showcase different paradigms of programming.
Way back when I was writing the byte code interpreter, I used an array of delegates and a MemoryStream, the closest I could get to a jump table. This time, however, I feel like the object oriented approach is more viable; for one, state objects will bundle lots of methods together, and the interface is easier to enforce with an abstract class. More importantly, since the state hold some actual data they should manage, representing them as data structures (although delegates technically are data structures as well) just makes more sense intuitively.

So we start with two classes that might look like this. We'll talk about why I'm not passing around GameTime in a later blog post, but there are more pressing issues at hand.
First and foremost, we need the option to change the stack to turn this from a layer of indirection to an actual state machine. The basic stack operations are Push and Pop. In this context, they refer to entering a substate and returning to the superstate. To change substates of the same superstate, you'd first pop the current state and immediately push the new state. We'll also provide a method for that.
But just deciding on these three methods doesn't answer all our questions. For instance, how should you refer to the game state you want to transition to?
You could just pass in the GameState instance you want and be done with it, but then we still need to talk about where to get that from; we can't make game states static classes because they need to inherit GameState, but we could make them singletons or provide them as properties of GameStateMachine. We could use a dictionary and refer to them with string handles or an enum.

Or we don't enforce any such limit and allow new instances to be pushed freely.
That has the advantage that, if you can come up with a scenario where the same state has to be present in the stack multiple times (I couldn't), that's no more difficult than any other. It also makes it easy to pass info into the state you want to push, since you are responsible for its construction.
The issue I see with it, though, is that then you can't very well initialize all states upfront, which might be a bit too costly for fluent transitions. But, if we allow states to be initialized freely, we also allow them to be singletons or whatever else, so the states that need that optimization can get it individually.

So, if we can get info into the state we're pushing, how do we get it out when we return?
Well, if we provide random access to the previous state, there isn't much I could do to stop you from casting to specific substates and calling arbitrary functions, but I think that's backwards. A function doesn't care who called it, and returns whatever it wants. It's the caller's job to figure out what the output means, should there be uncertainties.
So, the substate provides its output to the GameStateMachine.ExitSubstate method, and the state machine makes sure the superstate gets it. One bug I anticipate with this, would occur when you switch substates (i.e. pop and immediately push a new one) and the new substate Exits with some output data the superstate didn't expect because it called a different substate. If it didn't expect any output, it'd probably discard it and be done with it. But if it expected a different kind of output, we'll get undefined behaviour, depending on how the state handles the input.
There's no easy fix for it either, as you could design your state transitions around that, think polymorphism. The only solution I see for this problem is to be aware of it, and make sure the logging makes clear what happens, in case it ever happens accidentally.

Finishing Touches

With the elephant out of the room, what do we need to do to wrap this up?
  • States need to have access to their superstate. We will add the public GameState Previous property I promised earlier and have the GameStateMachine set it when the state is pushed.
  • States should be notified when they are entered, so we will add two empty virtual mehods Enter and Return to the GameState class. The GameStateMachine will call these after a state has been pushed or its substate popped, respectively.
  • Some states may want to override input handling but delegate update logic to the superstate. To allow this scenario in particular, we'll add GameState.HandleInput, which will be called before GameState.Update in the state machine's own Update method.
  • Both classes still lack initialization code, and the GameStateMachine has to be set up in the MainGame.
Those should be some pretty simple touch-ups to do on your own, in case anyone is actually following along here. This post is long enough as it is, and I don't want to split it so we can move along a little more quickly from now on.

No comments:

Post a Comment