Adversarial policies in Go

Iterated adversarial training

We performed an iterated adversarial training procedure that alternately trains a victim vn and an adversary an. After nine iterations, our final victim v9 remains vulnerable both to a freshly trained "atari" attack and the "iterated" attack a9. These attacks are described in the following sections. You can also explore games from intermediate steps of the iterated adversarial training.

Atari cyclic attack

We trained a new adversary by fine-tuning from an early adversarial non-cyclic checkpoint. It was able to defeat v9 at 512 visits of search in 81% of games, but the win rate drops to 4% at 4096 visits. This demonstrates that our victim is easily attacked until it uses high amounts of search. Explore randomly sampled games below.

This adversary starts by inducing the victim to form bamboo joints: pairs of stone separated by two empty spaces. In normal games, these are often efficient shapes due to their strong connections. In the first game here, the first bamboo joint is formed on move 24 between that white stone, the one next to it, and the two below those. Additional joints are formed on move 28 and 52. By move 102, a large cyclic group emerges, another hallmark of this adversary’s strategy. For over 100 moves, the adversary systematically encloses the cyclic group. Throughout, this adversary often leaves many stones "in atari" in Go terminology, what might be called hanging pieces in chess – stones that could be instantly captured if the victim opts to. At move 217, the attack enters the final phase: the adversary threatens to split the bamboo joints. The victim connects at 218 and 220, each move reducing the overall group’s liberties. A fatal mistake is committed at move 222, allowing the adversary to capture everything with move 223.

Victim
Rank
-
Caps
28
Time
--:--
Adversary
Rank
-
Caps
69
Time
--:--
Comments
Black passed.

adversary predicted win prob: 1.00 loss: 0.00, predicted score: 34.5

Victim: v9, 512 visits

Adversary: atari-adversary

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrue34.5343
wbTrue48.5323
wbTrue66.5345
wbTrue80.5341
wbTrue82.5340
wbTrue84.5350
bwTrue93.5313
bwTrue103.5314
bbFalse-354.5443
bbFalse-354.5395

Stalling cyclic attack

We trained another adversary by fine-tuning a cyclic adversary. Again it was able to convincingly defeat v9 at 512 visits of search, achieving a win rate of 91.5%, with the win rate dropping to 5% at 4096 visits. The adversary stalls in the first few moves by passing, so we call it stall-adversary. (When the victim is white it could pass after the adversary passes to end the game and win by komi, but one of KataGo's default settings conservativePass = true disallows this. Turning conservativePass = false defeats this adversary, but had we trained with conservativePass = false we likely would have simply found a different adversary.)

In the first game below, the adversary only makes its first actual play on move 9. It's not completely clear what this accomplishes, given KataGo only has a move history window of 5 moves. Potentially, the number of stones on the board pushes it off distribution or towards the distribution of a very early version of KataGo back when it might randomly pass at the beginning of the game. Alternatively, it might simply help with this adversary's style of play in the non-cyclic areas of the board, where it likes to set up very long groups on the second line. This can be seen, for example, through move 103. These groups are efficient for both making sure the victim controls the center and has ample space for setting up the cyclic group, and for making sure there will be a long chain of moves to play that still give a point benefit, which may help misdirect the victim's search in the critical moments. It is uncertain, though, if either of these hypotheses have a decisive impact.

Regardless, through move 59 the adversary sets up most of a rather conventional inside group akin to the original cyclic adversary. It completes it on move 63. Inside groups similar to this one are also one of the distinctive features of this adversary, contrasting with the other adversaries from the defense paper which have shapes distinctively different from the original adversary. Then, on move 105, the adversary starts in earnest to surround and induce completion of the cyclic group. Through move 188, we see one of the potential benefits of the long, second line group that the adversary set up before, where the victim plays many moves on the left side while the adversary plays additional moves for surrounding the cyclic group. At move 232, the victim could still save the cyclic group by attacking black's group in the lower right corner. But after it plays on the top and the adversary fills another liberty, the cyclic group is doomed, even though the adversary waits until move 341 to capture outright.

Victim
Rank
-
Caps
13
Time
--:--
Adversary
Rank
-
Caps
59
Time
--:--
Comments
White passed.

victim predicted win prob: 0.00 loss: 1.00, predicted score: -72.3

Victim: v9, 512 visits

Adversary: stall-adversary

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrue72.5354
wbTrue96.5322
wbTrue128.5367
wbTrue174.5357
bwTrue85.5337
bwTrue141.5350
bwTrue151.5352
bwTrue199.5354
bbFalse-194.5279
bbFalse-26.5369

Iterated attack

We fine-tune the final adversary a8 that v9 was trained against and produce an adversary a9. This adversary defeats v9 even at 65536 visits in 42% of games, indicating a substantial attack surface area remains for v9 at high visit counts. Explore randomly sampled games below.

In the first game, on move 39, we see the adversary complete a diamond, "ponnuki"-like shape of 4 stones in the center. This will become the center of the cyclic group, and is a distinctive shape of this adversary. In the subsequent moves, we see it play around that area, letting the victim separate off and surround the diamond. By move 80, the cyclic part of the victim's group is completed. After that, the adversary slows down, letting the victim expand the cyclic group and only gradually surrounding it. Eventually, on move 193, the adversary completes the encirclement. At this point the cyclic group is already lost, though the final capture takes place later on move 231.

Victim
Rank
-
Caps
13
Time
--:--
Adversary
Rank
-
Caps
47
Time
--:--
Comments
White passed.

victim predicted win prob: 0.00 loss: 1.00, predicted score: -22.5

Victim: v9, 65536 visits

Adversary: a9

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrue22.5324
wbTrue22.5398
wbTrue100.5364
bwTrue9.5347
bwTrue97.5371
wwFalse-325.5452
wwFalse-271.5398
bbFalse-354.5463
bbFalse-168.5353
bbFalse-118.5345

Citation Info

@inproceedings{wang2023adversarial,
  title={Adversarial Policies Beat Superhuman Go {AI}s},
  author={Wang, Tony T. and Gleave, Adam and Tseng, Tom and Pelrine, Kellin and Belrose, Nora and Miller, Joseph and Dennis, Michael D and Duan, Yawen and Pogrebniak, Viktor and Levine, Sergey and Russell, Stuart},
  booktitle={International Conference on Machine Learning},
  year={2023},
  eprint={2211.00241},
  archivePrefix={arXiv}
}
@misc{tseng2024ais,
  title={Can Go {AI}s be adversarially robust?},
  author={Tseng, Tom and McLean, Euan and Pelrine, Kellin and Wang, Tony T. and Gleave, Adam},
  year={2024},
  eprint={2406.12843},
  archivePrefix={arXiv}
}