Adversarial policies in Go

Early adversarial training

David Wu (lightvector), the creator and primary developer of KataGo, has incorporated adversarial training against the cyclic exploit into the official self-play training run of KataGo since December 2022. The adversarial training consists of starting a small fraction (~0.1%) of self-play games in positions where the cyclic exploit is being executed, with the remainder of games being regular self-play games. This adversarial training has been partially successful in that the adversarially trained networks are able to beat our original cyclic adversary. However, we are able to fine-tune our original adversary to defeat these updated networks. This suggests that it is non-trivial to defend against the cyclic exploit, unlike the pass exploit which we were able to manually patch. Developing techniques to train agents that are immune to this attack while maintaining high Go strength remains an interesting open problem.

This page shows our results against the KataGo network kata1-b60c320-s7701878528-d3323518127, abbreviated to b60-s7702m and released in May 2023. These results are superseded by our results against a December 2023 network, but we preserve them here since they are linked in our first paper.

Original cyclic adversary loses to no-search b60-s7702m

b60-s7702m has had several months of adversarial training and defeats the original cyclic adversary in 1882/2000 = 94.1% of games even when b60-s7702m plays without search. b60-s7702m is stronger at defending the cyclic group. (The games displayed are non-randomly selected to show the wins achieved by the adversary.)

Victim
Rank
-
Caps
27
Time
--:--
Adversary
Rank
-
Caps
64
Time
--:--
Comments
White passed.

victim predicted win prob: 0.00 loss: 1.00, predicted score: -103.1

Victim: b60-s7702m, no search

Adversary: Cyclic adversary, 545 million training steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrue104.5318
wbTrue118.5293
bwTrue69.5371
bwTrue135.5318
wwFalse-367.5476
wwFalse-297.5446
bbFalse-242.5399
bbFalse-224.5417

Fine-tuned cyclic adversary vs. 4096-visit b60-s7702m

After 168 million fine-tuning training steps, the cyclic adversary achieves a win rate of 188/400 = 47% against b60-s7702m with 4096 victim visits. The attack is still a cyclic attack, though the placement of the cyclic group has moved from the corner of the board to the center of one side of the board.

Victim
Rank
-
Caps
4
Time
--:--
Adversary
Rank
-
Caps
44
Time
--:--
Comments
White passed.

victim predicted win prob: 0.00 loss: 1.00, predicted score: -84.5

Victim: b60-s7702m, 4096 visits

Adversary: Cyclic adversary, 168 million fine-tuning steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrue84.5306
wbTrue105.5333
bwTrue95.5309
bwTrue112.5294
wwFalse-147.5357
wwFalse-113.5308
bbFalse-134.5319
bbFalse-94.5317

Fine-tuned cyclic adversary vs. 100,000-visit b60-s7702m

The fine-tuned cyclic adversary also beats b60-s7702m using 100,000 victim visits with a win rate of 7/40 = 17.5%. (The games displayed are non-randomly selected to show the wins achieved by the adversary.)

Victim
Rank
-
Caps
13
Time
--:--
Adversary
Rank
-
Caps
50
Time
--:--
Comments
White passed.

victim predicted win prob: 0.00 loss: 1.00, predicted score: -54.5

Victim: b60-s7702m, 100,000 visits, 10 search threads

Adversary: Cyclic adversary, 168 million fine-tuning steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download
wbTrue54.5340
wbTrue110.5306
bwTrue93.5313
bwTrue131.5303
wwFalse-181.5418
wwFalse-137.5328
bbFalse-168.5411
bbFalse-118.5365

Citation Info

@inproceedings{wang2023adversarial,
  title={Adversarial Policies Beat Superhuman Go {AI}s},
  author={Wang, Tony T. and Gleave, Adam and Tseng, Tom and Pelrine, Kellin and Belrose, Nora and Miller, Joseph and Dennis, Michael D and Duan, Yawen and Pogrebniak, Viktor and Levine, Sergey and Russell, Stuart},
  booktitle={International Conference on Machine Learning},
  year={2023},
  eprint={2211.00241},
  archivePrefix={arXiv}
}
@misc{tseng2024ais,
  title={Can Go {AI}s be adversarially robust?},
  author={Tseng, Tom and McLean, Euan and Pelrine, Kellin and Wang, Tony T. and Gleave, Adam},
  year={2024},
  eprint={2406.12843},
  archivePrefix={arXiv}
}