Adversarial policies in Go

Game analysis

Qualitative analysis of adversary behavior

An expert-level (6 dan) human player on our team (Kellin Pelrine) analyzed the following game. It shows typical behavior and outcomes with an adversary trained on and playing a pass-hardened KataGo victim: the victim gains an early and soon seemingly insurmountable lead. The adversary sets a trap that would be easy for a human to see and avoid. But the victim is oblivious and collapses.

The adversary plays non-standard, subpar moves right from the beginning. The victim's estimate of its winrate is over 90% by move 9, and a human in a high-level match would likewise hold a large advantage from this position.

On move 20, the adversary initiates a tactic we see consistently, to produce a 'dead' (at least, according to normal judgment) square 4 group in one quadrant of the board. Elsewhere, the adversary plays low, mostly second and third line moves. This is also common in its games, and leads to the victim turning the rest of the center into its sphere of influence. We suspect this helps the adversary later play moves in that area without the victim responding directly, because the victim is already strong in that area and feels confident ignoring a number of moves.

On move 74, the adversary begins mobilizing its 'dead' stones to set up an encirclement. Over the next 100+ moves, it gradually surrounds the victim in the top left. A key pattern here is that it leads the victim into forming an isolated group that loops around and connects to itself (a group with a cycle instead of tree structure). David Wu, creator of KataGo, suggested Go-playing agents like the victim struggle to accurately judge the status of such groups, but they are normally very rare. This adversary seems to produce them consistently.

Until the adversary plays move 189, the victim could still save that cycle group, and in turn still win by a huge margin. There are straightforward moves to do so that would be trivial to find for any human playing at the victim's normal level. Even a human who has only played for a few months or less might find them. For instance, on 189 it could have instead played at the place marked 'A.' But after 189, it is impossible to escape, and the game is reversed. The victim seems to have been unable to detect the danger. Play continues for another 109 moves but there is no chance for the victim (nor would there be for a human player) to get out of the massive deficit it was tricked into.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Latestdef, 1600 visits

Adversary: 498 million training steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download
bwTrue83.5298

How the victim's predicted win rate varies over time

In this game, we find the victim's predicted win rate oscillates several times before the victim's group is captured at move 273. At move 248, the victim predicted it would win with 91% confidence, yet at its next turn at move 250 it has gone down to a <1% win rate prediction. At move 254, it jumps back to a >99% win rate prediction. A few moves later, the victim's win rate prediction again fluctuates dramatically, hitting <1% at move 266, 99% at move 268, and <1% at move 272. After the capture on the following turn, the victim (correctly) predicts a <1% win rate until the end of the game.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Latest, 4096 visits

Adversary: 545 million training steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrue140.5347

Positions analyzed with varying visits

We make available here the full game records for the positions analyzed with different levels of visits in the paper appendix discussing the role of search in robustness. For details, please refer to the appendix.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Latestdef, 1600 visits

Adversary: 498 million training steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrue58.5335
bwTrue91.5269
wbTrue56.5326
bwTrue127.5334
wbTrue98.5333
bwTrue95.5286
wbTrue84.5274
bwTrue97.5305
bwTrue93.5297
wbTrue36.5304
bwTrue127.5314
bwTrue154.5320
wbTrue116.5336
wbTrue38.5316
bwTrue27.5333
wbTrue40.5326

Positions analyzed with 1 billion visits

The following game records correspond to positions that were analyzed with 1 billion visits, where the victim still failed to find the correct move. The original victim that played the games had 1 million visits. For details, please refer to the paper appendix discussing the role of search in robustness.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Latest, 1 million visits

Adversary: 545 million training steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrue97.5311
wbTrue116.5306
bwTrue189.5360
bwTrue133.5433
wbTrue108.5334

Citation Info

@inproceedings{wang2023adversarial,
  title={Adversarial Policies Beat Superhuman Go {AI}s},
  author={Wang, Tony T. and Gleave, Adam and Tseng, Tom and Pelrine, Kellin and Belrose, Nora and Miller, Joseph and Dennis, Michael D and Duan, Yawen and Pogrebniak, Viktor and Levine, Sergey and Russell, Stuart},
  booktitle={International Conference on Machine Learning},
  year={2023},
  eprint={2211.00241},
  archivePrefix={arXiv}
}
@misc{tseng2024ais,
  title={Can Go {AI}s be adversarially robust?},
  author={Tseng, Tom and McLean, Euan and Pelrine, Kellin and Wang, Tony T. and Gleave, Adam},
  year={2024},
  eprint={2406.12843},
  archivePrefix={arXiv}
}