Avoiding race conditions in GenServer
02 Mar 2017Suppose we are making a multi-player game. We are choosing to store the state of the game in
a struct (Game
). The game will start without any players and players will
individually choose to join the game. We’ve determined that there must be a maximum number of players
(@max_players
) and the players will be represented by a list of strings (player nicknames).
Let’s try this out in iex
.
Seems to be working! Now we need a way to wrap this in a process
. OTP
provides a number of ways
to do this. In this case we are going to use a GenServer. Each game will get it’s own
isolated process which can be identified by a PID
. Let’s create our GenServer which we will call GameServer
.
We have two calls here
get_state
returns the current game state.add_player
returns{:error, reason}
if the player couldn’t be added and:ok
otherwise.
Let’s try this out in iex now:
A hidden race condition
This seems to be working in the repl, but if you deployed this into production you may find that every
once in a while a 5th or even 6th player will sneak in. This code has a race condition! Let’s prove this by adding
a temporary function to GameServer
which adds n
players simultaneously:
This function takes a list of nicks and calls add_player
simultaneously from one process per nick.
Let’s try it out with 5 players attempting to join.
Wait, how did richard
get in there?
The erlang process model is a great way to handle concurrent operations on state and if done right it’s quite hard to mess up your state in this way. You just need to think a little up front when you are mutating state.
In order to demonstrate how this is happening, let’s add some puts
into our GameServer.add_player
function.
And when we run again.
This is a classic check-then-act race condition. It’s important to remember that the way GenServers work is
they operate on each message in their mailbox one at a time and in the order they are received.
The 5 get_state
messages come in and the server processes those and returns the current state. Then every player
sees a stale state and 5 new messages come in to add the players to the game state.
In order to fix this, we need to make add_player
an atomic operation.
GenServer makes this easy for us because the process model ensures that everything that happens in the callback is atomic!
So all we need to do is move this logic into the add_player
callback.
Trying again now.
Now our state is consistent like we’d expect! When richard tried to join he would see the max_players
error.
The nice thing about the semantics of GenServer is that you can think about your code in terms of clients
and servers
.
All public API functions on the top of the module happen in the calling process or client side.
All handle_*
callbacks happen in the server process or server side.
If you maintain this mental model it becomes obvious which side your code belongs in.
On bottlenecks
While you may be tempted to throw everything in the server side for safety, you need to consider the performance implications of doing so. When writing callbacks you should be asking yourself
What are the minimal operations I need to determine the next state?
It’s important to remember that callbacks can only happen one at a time so they can easily become a bottleneck. Be careful about performing long operations (particularly network operations) in the callbacks if you can get away with doing them somewhere else.
A holistic approach
Although this works as an example on GenServers and state, I’d suggest that this problem has a deeper, root cause. There is a design flaw
with the Game
datastructure. When designing a new datastructure, it is a good idea to guard it from ever getting
into an invalid state. The logic of keeping the structure consistent belongs in the module itself. You should then encourage
the programmers to only use that interface to manipulate the structure and return descriptive errors when something is wrong.
In this case, I might change the interface of the Game
module instead to something like this:
This will leave your GenServers pretty bare but I believe this is a good thing. You can consider something like exactor to make them even smaller.