As PAXsims readers will know, the recent Connections UK professional wargaming conference featured a large political/military crisis game exploring crisis stability in East and Southeast China: DIRE STRAITS. This is the second time we have held a megagame at Connections UK, and—judging from last year’s survey—they are popular with participants. This year we organized something that addressed a series of near future (2020) challenges, said against the backdrop of uncertainties in Trump Administration foreign policy and the growing strategic power of China.
We also conducted an experiment.
Specifically, we decided to use the game to explore the extent to which different analytical teams would reach similar, or different, conclusions about the methodology and substantive findings of the game. If their findings converged, that would provide some evidence that wargaming can generate solid analytical insights. If their findings diverged a great deal, however, that would suggest that wargaming suffers from a possible “eye of the beholder” problem, whereby the interpretation of game findings might be heavily influenced by the subjective views and idiosyncratic characteristics of the analytical team—whether that be training/background/expertise, preexisting views, or the particular mix of people and personalities involved. The latter finding could have quite important implications, in that game results might have as much to do with who was assessing them and how, as with the actual outcome of the game.
To do this, we formed three analytical teams: TEAM UK (composed of one British defence analyst and one serving RAF officer), TEAM EURO (composed of analysts from the UK, Finland, Sweden, and the Netherlands), and TEAM USA (composed of three very experienced American wargamers/analysts). Each team were free to move around and act as observers during the games, and had full access to game materials, briefings, player actions and assessments, and could review the record of game events produced during DIRE STRAITS by our media team.
We were well aware at the outset that DIRE STRAITS would be an imperfect analytical game. It was, after all, required to address multiple objectives: to accommodate one hundred or so people, most of whom would not be subject matter experts on the region; to be relatively simple; to be enjoyable; and to make do with the time and physical space assigned to us by the conference organizers. It was also designed on a budget of, well, nothing—the time and materials were all contributed by Jim Wallman and myself. From an experimental perspective, however, the potential shortcomings in the game were actually assets for the experiment, since they represented a number of potential methodological and substantive issues which the analytical teams might focus on. To make it clearer what their major take aways were, we asked each team to provide a list of their top five observations in each or two categories (game methodology, and substantive game findings).
And the results are now in:
All three teams did a very good job, and there is a great deal of insight and useful game design feedback contained within the reports. But what do they suggest about our experimental question? I have a lot more analysis of the findings to undertake, but here is a very quick, initial snapshot.
First, below is a summary of each team’s five main conclusions regarding game methodology. I have coded the results in dark green if there is full agreement across all three teams, light green for substantial agreement, yellow for some agreement, and red for little/no agreement. The latter does not mean that the teams necessarily would disagree on a point, only that it did not appear in the key take-aways of each. I have also summarized each conclusion into a single sentence—in the report, each is a full paragraph or more.
A Venn diagram gives a graphic sense of the degree of overlap in the team methodological assessments.
One interesting point of divergence was the teams’ assessment of the White House subgame. TEAM USA had a number of very serious concerns about it. TEAM EURO, on the other hand—while noting the risks of embedding untested subgames in a larger game dynamic—nevertheless concluded that they “found this modelling fairly accurate.” TEAM UK had a somewhat intermediate position: while arguing that the White House subgame should have have been more careful in its depiction of current US political dynamics to avoid the impression of bias, this “obscured the fact that there were actually quite subtle mechanisms in the White House game, and that the results were the effects of political in-fighting and indeed, it could even show the need to “drain the swamp” to get a functional White House.” The various points made by the teams on this issue, and the subtle but important differences between them, will be the subject of a future PAXsims post.
Next, let us compare the three teams’ assessment of the substantive findings of the game. TEAM USA argued that the methodological problems with the game were such that no conclusions could be drawn. TEAM EURO felt that the actions of some teams were unrealistic (largely due to a lack of subject matter expertise and cultural/historical familiarity), but that overall “the overall course of action seemed to stay within reasonable bounds of what can be expected in the multitude of conflicts in the area.” TEAM UK was careful to distinguish between game outcomes that appeared to be intrinsic to the game design, and those that emerged from player interaction and emergent gameplay, and were able to identify several key outcomes among the latter.
As both the table above and the diagram below indicate, there was much greater divergence here (much of it hinging on assessments of game methodology, player behaviour, or plausibility).
Again, I want to caution that this is a very quick take on some very rich data and analysis, and I might modify some of my initial impressions upon a deeper dive. However, I do think there is enough here to both underscore the potential value of crisis gaming as an analytical tool, and to sound some fairly loud warning bells about potential interpretive divergence in post-game analysis. At the very least, it suggests the value of using mixed methods to analyze game outcomes, and/or—better yet—a sort of analytical red teaming. If different groups of analysts are asked to draw separate conclusions, and those findings are then compared, convergence can be used as a rough proxy for higher confidence interpretations, while areas of divergence can then be examined in great detail. I am inclined to think, moreover, that producing separate analyses then bringing those together is likely to be more useful than simply combining the groups into a larger analytical team at the outset, since it somewhat reduces the risk that findings are driven by a dominant personality or senior official.
One final point: DIRE STRAITS assigned no fewer than nine analysts to pick apart its methodology, assess the findings in light of those strengths and weaknesses, and we have now published that feedback. Such explicit self-criticism is almost unheard of in think-tank POL/MIL gaming, and far too rare in most professional military wargaming too. Hopefully the willingness of Connections UK to do this will encourage others to as well!