Adversarial policies in Go

Cyclic attack

This section showcases games our cyclic adversary played against KataGo. We primarily attack KataGo network checkpoint b40c256-s11840935168-d2898845681, which we dub Latest since it is the latest confidently rated KataGo network at the time of conducting our experiments.

KataGo without search (top-100 European player level)

Without tree search, Katago's Latest network plays at the strength of a top-100 European professional. We trained an adversary that wins 100% of the time over 1000 games against this victim.1 Our adversary (which we refer to as the "cyclic adversary") gets the victim to form a large circular structure, and then tricks the victim into allowing the circular structure to be killed. See the "Game Analysis" tab for a more in depth analysis of this adversarial strategy.

[1] The games below are actually against a version of Latest that was patched to be immune to a simpler pass-based attack. We applied this patch to the victim to force our adversary to learn a more interesting attack. The patch is a hardcoded defense that forbids the victim from passing until it has no more legal moves outside its territory. We call the patched victim Latestdef. Because we limit the victim's passing, games are usually played out to the end, terminating automatically once all points belong to a pass-alive-group or pass-alive-territory.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Latestdef, no search

Adversary: 545 million training steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrue106.5385
wbTrue110.5347
wbTrue128.5345
bwTrue121.5376
bwTrue137.5378
bwTrue137.5352

KataGo with 4096 visits (superhuman)

With 4096 visits, KataGo's Latest network plays at a superhuman level. Nonetheless, our adversary still achieves a 97.3% win rate against Latest and a 95.7% win rate against the defended victim Latestdef. Games against Latestdef are shown below.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Latestdef, 4096 visits

Adversary: 545 million training steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrue108.5373
wbTrue116.5383
wbTrue124.5347
bwTrue47.5492
bwTrue123.5356
bwTrue123.5352
bwTrue129.5372
bwTrue133.5500
wwFalse-231.5396
wwFalse-155.5467

KataGo with 10,000,000 visits

Our adversary with 600 visits still achieves a 72% win rate against Latest with 10,000,000 visits, demonstrating that large amounts of search is not a practical defense against the adversary.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Latest, 10,000,000 visits, 1024 search threads

Adversary: 545 million training steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrue114.5306
wbTrue122.5334
wbTrue136.5355
bwTrue89.5335
bwTrue125.5384
bwTrue167.5374
wwFalse-127.5440
bbFalse-282.5553

Citation Info

@inproceedings{wang2023adversarial,
  title={Adversarial Policies Beat Superhuman Go {AI}s},
  author={Wang, Tony T. and Gleave, Adam and Tseng, Tom and Pelrine, Kellin and Belrose, Nora and Miller, Joseph and Dennis, Michael D and Duan, Yawen and Pogrebniak, Viktor and Levine, Sergey and Russell, Stuart},
  booktitle={International Conference on Machine Learning},
  year={2023},
  eprint={2211.00241},
  archivePrefix={arXiv}
}
@misc{tseng2024ais,
  title={Can Go {AI}s be adversarially robust?},
  author={Tseng, Tom and McLean, Euan and Pelrine, Kellin and Wang, Tony T. and Gleave, Adam},
  year={2024},
  eprint={2406.12843},
  archivePrefix={arXiv}
}