1 Some explanations about Desyncs
2 Last updated: 2014-02-23
3 ------------------------------------------------------------------------
9 * 1.1) OpenTTD multiplayer architecture
10 * 1.2) What is a Desync and how is it detected
11 * 1.3) Typical causes of Desyncs
12 2.0) What to do in case of a Desync
13 * 2.1) Cache debugging
14 * 2.2) Desync recording
15 3.0) Evaluating the Desync records
17 * 3.2) Evaluation the replay
18 * 3.3) Comparing savegames
21 1.1) OpenTTD multiplayer architecture
22 ---- --------------------------------
23 OpenTTD has a huge gamestate, which changes all of the time.
24 The savegame contains the complete gamestate at a specific point
25 in time. But this state changes completely each tick: Vehicles move
28 However, most of these changes in the gamestate are deterministic:
29 Without a player interfering a vehicle follows its orders always
30 in the same way, and trees always grow the same.
32 In OpenTTD multiplayer synchronisation works by creating a savegame
33 when clients join, and then transfering that savegame to the client,
34 so it has the complete gamestate at a fixed point in time.
36 Afterwards clients only receive 'commands', that is: Stuff which is
45 These commands contain the information on how to execute the command,
46 and when to execute it. Time is measured in 'network frames'.
47 Mind that network frames to not match ingame time. Network frames
48 also run while the game is paused, to give a defined behaviour to
49 stuff that is executing while the game is paused.
51 The deterministic part of the gamestate is run by the clients on
52 their own. All they get from the server is the instruction to
53 run the gamestate up to a certain network time, which basically
54 says that there are no commands scheduled in that time.
56 When a client (which includes the server itself) wants to execute
57 a command (i.e. a non-predictable action), it does this by
58 - calling DoCommandP resp. DoCommandPInternal
59 - These functions first do a local test-run of the command to
60 check simple preconditions. (Just to give the client an
61 immediate response without bothering the server and waiting for
62 the response.) The test-run may not actually change the
63 gamestate, all changes must be discarded.
64 - If the local test-run succeeds the command is sent to the server.
65 - The server inserts the command into the command queue, which
66 assigns a network frame to the commands, i.e. when it shall be
67 executed on all clients.
68 - Enhanced with this specific timestamp, the command is send to all
69 clients, which execute the command simultaneously in the same
70 network frame in the same order.
72 1.2) What is a Desync and how is it detected
73 ---- ---------------------------------------
74 In the ideal case all clients have the same gamestate as the server
75 and run in sync. That is, vehicle movement is the same on all
76 clients, and commands are executed the same everywhere and
77 have the same results.
79 When a Desync happens, it means that the gamestates on the clients
80 (including the server) are no longer the same. Just imagine
81 that a vehicle picks the left line instead of the right line at
82 a junction on one client.
84 The important thing here is, that noone notices when a Desync
85 occurs. The desync client will continue to simulate the gamestate
86 and execute commands from the server. Once the gamestate differs
87 it will increasingly spiral out of control: If a vehicle picks a
88 different route, it will arrive at a different time at a station,
89 which will load different cargo, which causes other vehicles to
90 load other stuff, which causes industries to notice different
91 servicing, which causes industries to change production, ...
92 the client could run all day in a different universe.
94 To limit how long a Desync can remain unnoticed, the server
95 transfers some checksums every now and then for the gamestate.
96 Currently this checksum is the state of the random number
97 generator of the game logic. A lot of things in OpenTTD depend
98 on the RNG, and if the gamestate differs, it is likely that the
99 RNG is called at different times, and the state differs when
102 The clients compare this 'checksum' with the checksum of their
103 own gamestate at the specific network frame. If they differ,
104 the client disconnects with a Desync error.
106 The important thing here is: The detection of the Desync is
107 only an ultimate failure detection. It does not give any
108 indication on when the Desync happened. The Desync may after
109 all have occurred long ago, and just did not affect the checksum
110 up to now. The checksum may have matched 10 times or more
111 since the Desync happend, and only now the Desync has spiraled
112 enough to finally affect the checksum. (There was once a desync
113 which was only noticed by the checksum after 20 game years.)
115 1.3) Typical causes of Desyncs
116 ---- -------------------------
117 Desyncs can be caused by the following scenarios:
118 - The savegame does not describe the complete gamestate.
119 - Some information which affects the progression of the
120 gamestate is not saved in the savegame.
121 - Some information which affects the progression of the
122 gamestate is not loaded from the savegame.
123 This includes the case that something is not completely
124 reset before loading the savegame, so data from the
125 previous game is carried over to the new one.
126 - The gamestate does not behave deterministic.
127 - Cache mismatch: The game logic depends on some cached
128 values, which are not invalidated properly. This is
129 the usual case for NewGRF-specific Desyncs.
130 - Undefined behaviour: The game logic performs multiple
131 things in an undefined order or with an undefined
132 result. E.g. when sorting something with a key while
133 some keys are equal. Or some computation that depends
134 on the CPU architecture (32/64 bit, little/big endian).
135 - The gamestate is modified when it shall not be modified.
136 - The test-run of a command alters the gamestate.
137 - The gamestate is altered by a player or script without
143 Desyncs which are caused by inproper cache validation can
144 often be found by enabling cache validation:
145 - Start OpenTTD with '-d desync=2'.
146 - This will enable validation of caches every tick.
147 That is, cached values are recomputed every tick and compared
149 - Differences are logged to 'commands-out.log' in the autosave
152 Mind that this type of debugging can also be done in singleplayer.
154 2.2) Desync recording
155 ---- ----------------
156 If you have a server, which happens to encounter Desyncs often,
157 you can enable recording of the gamestate alterations. This
158 will later allow the replay the gamestate and locate the Desync
161 There are two levels of Desync recording, which are enabled
162 via '-d desync=2' resp. '-d desync=3'. Both will record all
163 commands to a file 'commands-out.log' in the autosave folder.
165 If you have the savegame from the start of the server, and
166 this command log you can replay the whole game. (see Section 3.1)
168 If you do not start the server from a savegame, there will
169 also be a savegame created just after a map has been generated.
170 The savegame will be named 'dmp_cmds_*.sav' and be put into
173 In addition to that '-d desync=3' also creates regular savegames
174 at defined spots in network time. (more defined than regular
175 autosaves). These will be created in the autosave folder
176 and will also be named 'dmp_cmds_*.sav'.
178 These saves allow comparing the gamestate with the original
179 gamestate during replaying, and thus greatly help debugging.
180 However, they also take a lot of disk space.
185 To replay a Desync recording, you need these files:
186 - The savegame from when the server was started, resp.
187 the automatically created savegame from when the map
189 - The 'commands-out.log' file.
190 - Optionally the 'dmp_cmds_*.sav'.
191 Put these files into a safe spot. (Not your autosave folder!)
193 Next, prepare your OpenTTD for replaying:
194 - Get the same version of OpenTTD as the original server was running.
195 - Uncomment/enable the define 'DEBUG_DUMP_COMMANDS' in
196 'src/network/network_func.h'.
197 (DEBUG_FAILED_DUMP_COMMANDS is explained later)
198 - Put the 'commands-out.log' into the root save folder, and rename
199 it to 'commands.log'.
200 - Run 'openttd -D -d desync=3 -g startsavegame.sav'.
201 This replays the server log and creates new 'commands-out.log'
202 and 'dmp_cmds_*.sav' in your autosave folder.
204 3.2) Evaluation the replay
205 ---- ---------------------
206 The replaying will also compare the checksums which are part of
207 the 'commands-out.log' with the replayed gamestate.
208 If they differ, it will trigger a 'NOT_REACHED'.
210 If the replay succeeds without mismatch, that is the replay reproduces
211 the original server state:
212 - Repeat the replay starting from incrementally later 'dmp_cmds_*.sav'
213 while truncating the 'commands.log' at the beginning appropriately.
214 The 'dmp_cmds_*.sav' can be your own ones from the first reply, or
215 the ones from the original server (if you have them).
216 (This simulates the view of joining clients during the game.)
217 - If one of those replays fails, you have located the Desync between
218 the last dmp_cmds that reproduces the replay and the first one
221 If the replay does not succeed without mismatch, you can check the logs
222 whether there were failed commands. Then you may try to replay with
223 DEBUG_FAILED_DUMP_COMMANDS enabled. If the replay then fails, the
224 command test-run of the failed command modified the game state.
226 If you have the original 'dmp_cmds_*.sav', you can also compare those
227 savegames with your own ones from the replay. You can also comment/disable
228 the 'NOT_REACHED' mentioned above, to get another 'dmp_cmds_*.sav' from
229 the replay after the mismatch has already been detected.
230 See Section 3.2 on how to compare savegames.
231 If the saves differ you have located the Desync between the last dmp_cmds
232 that match and the first one that does not. The difference of the saves
233 may point you in the direction of what causes it.
235 If the replay succeeds without mismatch, and you do not have any
236 'dmp_cmd_*.sav' from the original server, it is a lost case.
237 Enable creation of the 'dmp_cmd_*.sav' on the server, and wait for the
240 Finally, you can also compare the 'commands-out.log' from the original
241 server with the one from the replay. They will differ in stuff like
242 dates, and the original log will contain the chat, but otherwise they
245 3.2) Comparing savegames
246 ---- -------------------
247 The binary form of the savegames from the original server and from
248 your replay will always differ:
249 - The savegame contains paths to used NewGRF files.
250 - The gamelog will log your loading of the savegame.
251 - The savegame data of AIs and the Gamescript will differ.
252 Scripts are not run during the replay, only their recorded commands
253 are replayed. Their internal state will thus not change in the
254 replay and will differ.
256 To compare savegame more semantically, there exist some ugly hackish
258 http://devs.openttd.org/~frosch/texts/zpipe.c
259 http://devs.openttd.org/~frosch/texts/printhunk.c
261 The first one decompresses OpenTTD savegames. The second one creates
262 a textual representation of an uncompressed savegame, by parsing hunks
263 and arrays and such. With both tools you need to be a bit careful
264 since they work on stdin and stdout, which may not deal well with
267 If you have the textual representation of the savegames, you can
268 compare them with regular diff tools.