PAXsims

Conflict simulation, peacebuilding, and development

Experimenting with DIRE STRAITS

As PAXsims readers will know, the recent Connections UK professional wargaming conference featured a large political/military crisis game exploring crisis stability in East and Southeast China: DIRE STRAITS. This is the second time we have held a megagame at Connections UK, and—judging from last year’s survey—they are popular with participants. This year we organized something that addressed a series of near future  (2020) challenges, said against the backdrop of uncertainties in Trump Administration foreign policy and the growing strategic power of China.

Pulp-O-Mizer_Cover_Image.jpg

We also conducted an experiment.

Specifically, we decided to use the game to explore the extent to which different analytical teams would reach similar, or different, conclusions about the methodology and substantive findings of the game. If their findings converged, that would provide some evidence that wargaming can generate solid analytical insights. If their findings diverged a great deal, however, that would suggest that wargaming suffers from a possible “eye of the beholder” problem, whereby the interpretation of game findings might be heavily influenced by the subjective views and idiosyncratic characteristics of the analytical team—whether that be training/background/expertise, preexisting views,  or the particular mix of people and personalities involved. The latter finding could have quite important implications, in that game results might have as much to do with who was assessing them and how, as with the actual outcome of the game.

To do this, we formed three analytical teams: TEAM UK (composed of one British defence analyst and one serving RAF officer), TEAM EURO (composed of analysts from the UK, Finland, Sweden, and the Netherlands), and TEAM USA (composed of three very experienced American wargamers/analysts). Each team were free to move around and act as observers during the games, and had full access to game materials, briefings, player actions and assessments, and could review the record of game events produced during DIRE STRAITS by our media team.

We were well aware at the outset that DIRE STRAITS would be an imperfect analytical game. It was, after all, required to address multiple objectives: to accommodate one hundred or so people, most of whom would not be subject matter experts on the region; to be relatively simple; to be enjoyable; and to make do with the time and physical space assigned to us by the conference organizers. It was also designed on a budget of, well, nothing—the time and materials were all contributed by Jim Wallman and myself. From an experimental perspective, however, the potential shortcomings in the game were actually assets for the experiment, since they represented a number of potential methodological and substantive issues which the analytical teams might focus on. To make it clearer what their major take aways were, we asked each team to provide a list of their top five observations in each or two categories (game methodology, and substantive game findings).

And the results are now in:

All three teams did a very good job, and there is a great deal of insight and useful game design feedback contained within the reports. But what do they suggest about our experimental question? I have a lot more analysis of the findings to undertake, but here is a very quick, initial snapshot.

First, below is a summary of each team’s five main conclusions regarding game methodology. I have coded the results in dark green if there is full agreement across all three teams, light green for substantial agreement, yellow for some agreement, and red for little/no agreement. The latter does not mean that the teams necessarily would disagree on a point, only that it did not appear in the key take-aways of each. I have also summarized each conclusion into a single sentence—in the report, each is a full paragraph or more.

DS method table

A Venn diagram gives a graphic sense of the degree of overlap in the team methodological assessments.

DS method.png

One interesting point of divergence was the teams’ assessment of the White House subgame. TEAM USA had a number of very serious concerns about it. TEAM EURO, on the other hand—while noting the risks of embedding untested subgames in a larger game dynamic—nevertheless concluded that they “found this modelling fairly accurate.” TEAM UK had a somewhat intermediate position: while arguing that the White House subgame should have have been more careful in its depiction of current US political dynamics to avoid the impression of bias, this “obscured the fact that there were actually quite subtle mechanisms in the White House game, and that the results were the effects of political in-fighting and indeed, it could even show the need to “drain the swamp” to get a functional White House.” The various points made by the teams on this issue, and the subtle but important differences between them, will be the subject of a future PAXsims post.

Next, let us compare the three teams’ assessment of the substantive findings of the game. TEAM USA argued that the methodological problems with the game were such that no conclusions could be drawn. TEAM EURO felt that the actions of some teams were unrealistic (largely due to a lack of subject matter expertise and cultural/historical familiarity), but that overall “the overall course of action seemed to stay within reasonable bounds of what can be expected in the multitude of conflicts in the area.” TEAM UK was careful to distinguish between game outcomes that appeared to be intrinsic to the game design, and those that emerged from player interaction and emergent gameplay, and were able to identify several key outcomes among the latter.

DS substantive table.png

As both the table above and the diagram below indicate, there was much greater divergence here (much of it hinging on assessments of game methodology, player behaviour, or plausibility).

DS substance

Again, I want to caution that this is a very quick take on some very rich data and analysis, and I might modify some of my initial impressions upon a deeper dive. However, I do think there is enough here to both underscore the potential value of crisis gaming as an analytical tool, and to sound some fairly loud warning bells about potential interpretive divergence in post-game analysis. At the very least, it suggests the value of using mixed methods to analyze game outcomes, and/or—better yet—a sort of analytical red teaming. If different groups of analysts are asked to draw separate conclusions, and those findings are then compared, convergence can be used as a rough proxy for higher confidence interpretations, while areas of divergence can then be examined in great detail. I am inclined to think, moreover, that producing separate analyses then bringing those together is likely to be more useful than simply combining the groups into a larger analytical team at the outset, since it somewhat reduces the risk that findings are driven by a dominant personality or senior official.

One final point: DIRE STRAITS assigned no fewer than nine analysts to pick apart its methodology, assess the findings in light of those strengths and weaknesses, and we have now published that feedback. Such explicit self-criticism is almost unheard of in think-tank POL/MIL gaming, and far too rare in most professional military wargaming too. Hopefully the willingness of Connections UK to do this will encourage others to as well!

4 responses to “Experimenting with DIRE STRAITS

  1. brtrain 08/10/2017 at 1:34 am

    Thank you for this; very interesting – even if I don’t remember seeing a brain floating in a jar participating in the game; perhaps it was in the White House room.
    You need to enlist that Downes-Martin character for his thoughts on this (unless he was on Team USA already).

  2. brtrain 08/10/2017 at 1:57 am

    Whoops, I see that he was (now that I’ve taken the time to read the full reports!).
    I’ll admit that I and the others on my team were often diverted by the DPRK subgame, and it did lead to a bit of tension between me (as Dear Leader) and one or another of my satraps I was constrained by the rules to punish. They were all quite intelligent and effective in their roles and I didn’t want to have to pick one to be punished each turn. And as you know, only one tried to unseat me via the Central Committee process; I think it might have been different if the players had had different temperaments.
    Our outbursts of clapping, and our snarky press releases and tweets, were of course Information Ops and done not so much in the vein of “hey, look at us” but rather because it was about all we had to offer each turn against constant American and American-puppet pressure and aggression, while we worked on other projects (e.g. the SSBN with new, improved screen doors and the rockoon that could have been used against the American satellite network).

  3. John D Salt 22/11/2017 at 3:53 pm

    Assuming for a moment that Steve Downes-Martin has now spent so long in leftpondia that he can now be accounted as culturally American rather than British, I think this experiment may demonstrate something I find interesting but which is quite different from what you were looking for. What I have in mind is the very considerable cultural difference in practice between UK and US defence analysis. I have seen this in approaches to simulation modelling, particularly validation of simulation models, and in OR generally, and it also somewhat corresponds to the leftpondia/rightpondia differences in what is meant by “systems engineering”.
    At risk of making a grossly over-simplified generalisation (which as a rightpondian analyst I cheerfully accept), US OR practice is firmly rooted in the ideas of “hard” science, numerical evidence, and perhaps even logical positivism. UK OR practice takes a much softer, interpretive view. In the light of this, and taking megagaming as a pretty “soft” method, it makes perfect sense to me (as both post-hoc rationalisation and confirmation bias say it should) that it was the American team who concluded that no firm conclusions could be drawn. I am not quite sure (having forgotten to calibrate my prejudices beforehand) whether I expected the Euro team to be mid-way between UK and US practice, or the UK to be mid-way between US and Euro, but that is probably because I am not sufficiently aware of what goes on the other side of the ditch (as distinct from the pond).
    I notice that all the teams mentioned “insufficient subject knowledge”. Isn’t one of the motivations for doing this sort of exercise to provide, if not subject knowledge itself, at least a hunger for it in the participants? I’ll bet several thousand pounds of somebody else’s money that the participants came away from the experience *wanting* to know more about the geopolitical situation being modelled, even if they didn’t actually increase their knowledge (and I’d bet a small anount of my own money that most of them did). Rightpondian (though not so much lefpondian) OR practice puts a fair amount of emphasis on the use of “soft” methods as PSMs, which in this case means Problem-Structuring Methods, rather than Platoon Serjeant-Majors. Perhaps the world is not yet ready for them, but I think we could do with a few more ISMs, or “Ignorance-Structuring Methods” — and I think wargames are a great method for getting people to realise just how much they don’t know about a subject. It certainly works for me.
    To my acute embarrassment, the two things I would most advise reading to back up my view about the usefulness of the “soft” approach to simulation/OR/wargaming/analysis, which I describe as the dominant rightpondian mode, are both written by Americans: Russell Ackoff’s “The Future of OR is Past” address to the OR society, and Charles Blilie’s book “The Promise and Limits of Computer Simulation”. Well, a prophet is always without honour in his own land. And we gave them Steven Downes-Martin.

  4. Rex Brynen 22/11/2017 at 4:07 pm

    John: That’s an excellent point (or, more accurately, series of points). While part of my motivation in dividing the three analytical teams the way I did was practical (easier collaboration), I did also want some methodological, political, and other variation between them. If quite different teams had produced similar reports, that would have provided strong evidence that there isn’t a potential “eye of the beholder” problem. The fact that they did come to somewhat different conclusions, I think, points to the need to more fully consider the ways in which the analysis process may frame/tilt/spin/filter the lessons learned from a game.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: