Adversarial policies in Go

Human games

Human amateur beats cyclic adversary

Our strongest adversarial policy (trained against Latestdef) is able to reliably beat KataGo at superhuman strength settings. However, a member of our team (Tony Wang) who is a novice Go player managed to convincingly beat this same adversary. This confirms that our adversarial policy is not generally capable, despite it beating victim policies that can themselves beat top human professionals. Instead, our victim policy harbors a subtle vulnerability.

Our evaluation is imperfect in one significant way: the adversary was not playing with an accurate model of its human opponent (rather it modeled Tony as Latest with 1 visit). However, given the poor transferability of our adversary to different KataGo checkpoints (see Figure 5.1 of the paper), we predict that the adversary would not win even if it had access to an accurate model of its human opponent.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Tony Wang (author)

Adversary: Cyclic adversary, 545 million training steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download
wwFalse-65.5194
bbFalse-36.5253

Human amateur beats pass adversary

The same Go novice (Tony Wang) also managed to beat our pass adversary by a large margin of over 250 points. This demonstrates our pass adversary is also not generally capable.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Tony Wang (author)

Adversary: Pass adversary, 34.1 million training steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download
wwFalse-314.5428
bbFalse-253.5473

Human exploits KataGo

A Go expert (Kellin Pelrine) was able to learn and apply the cyclic adversary's strategy to attack multiple types and configurations of AI Go systems. In this example they exploited KataGo with 100K visits, which would normally be strongly superhuman. Besides previously studying our adversary's game records, no algorithmic assistance was used in this or any of the following examples. The KataGo network and weights used here were b18c384nbt-uec, which is a newly released version the author of KataGo (David Wu) trained for a tournament. This network should be as strong or stronger than Latest.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: KataGo, 100K visits

Adversary: Kellin Pelrine (author)

Victim ColorWin colorAdversary WinScore differenceGame length Download
bwTrueresignation211

Human exploits Leela Zero

The same Go expert (Kellin Pelrine) also exploited Leela Zero with 100K visits, which would likewise normally be superhuman.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Leela Zero, 100K visits

Adversary: Kellin Pelrine (author)

Victim ColorWin colorAdversary WinScore differenceGame length Download
bwTrueresignation199

Human exploits Leela Zero 2

Kellin Pelrine also played 9 games against Leela Zero with 4096 visits, winning 6.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Leela Zero, 4096 visits

Adversary: Kellin Pelrine (author)

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrueresignation186
wbTrueresignation242
bwTrueresignation223
bwTrueresignation223
bwTrueresignation269
bwTrueresignation255
wwFalseresignation225
wwFalseresignation129
bbFalseresignation263

Human exploits a top KGS bot

Playing under standard human conditions on the online Go server KGS, the same Go expert (Kellin Pelrine) successfully exploited the bot JBXKata005 in 14/15 games. In the remaining game, the cyclic group attack still led to a successful capture, but the victim had enough points remaining to win. This bot uses a custom KataGo implementation, and at the time of the games was the strongest bot available to play on KGS.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: JBXKata005, 9 dan on KGS

Adversary: Kellin Pelrine (author)

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrueresignation221
wbTrueresignation237
wbTrueresignation277
wbTrueresignation219
wbTrueresignation237
wbTrueresignation245
bwTrueresignation200
bwTrueresignation182
bwTrueresignation206
bwTrueresignation234
bwTrueresignation206
bwTrueresignation218
bwTrueresignation222
bwTrueresignation266
bbFalse-26.5216

Human exploits top KGS bot with large handicap

In this last example, the same Go expert (Kellin Pelrine) exploited JBXKata005 while giving it a huge initial advantage through a 9 stone handicap. A top level human player with this much advantage would have a virtually 100% win rate against any opponent, human or algorithmic.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: JBXKata005, 9 dan on KGS, with 9 stone handicap

Adversary: Kellin Pelrine (author)

Victim ColorWin colorAdversary WinScore differenceGame length Download
bwTrueresignation227

Citation Info

@inproceedings{wang2023adversarial,
  title={Adversarial Policies Beat Superhuman Go {AI}s},
  author={Wang, Tony T. and Gleave, Adam and Tseng, Tom and Pelrine, Kellin and Belrose, Nora and Miller, Joseph and Dennis, Michael D and Duan, Yawen and Pogrebniak, Viktor and Levine, Sergey and Russell, Stuart},
  booktitle={International Conference on Machine Learning},
  year={2023},
  eprint={2211.00241},
  archivePrefix={arXiv}
}
@misc{tseng2024ais,
  title={Can Go {AI}s be adversarially robust?},
  author={Tseng, Tom and McLean, Euan and Pelrine, Kellin and Wang, Tony T. and Gleave, Adam},
  year={2024},
  eprint={2406.12843},
  archivePrefix={arXiv}
}