Adversarial policies in Go

Pass attack

Our initial attempts at attacking KataGo resulted in adversaries that exploited KataGo's passing behavior. These pass-based adversaries trick KataGo into passing when it shouldn't. While this attack is effective against victims which do not use tree search, it stops working once victims are able to use even a small amount of tree search. We developed the pass-hardening defense so that our adversaries would not get stuck learning this pass-exploit. This worked surprisingly well — training against pass-hardened victims resulted in our adversaries learning an alternate strategy that works even in the high search regime.

Without tree search, Katago's Latest network plays at the strength of a top-100 European professional. Our pass-based adversary achieves a 99% win rate against this victim by playing a counterintuitive strategy. The adversary stakes out a minority territory in the corner, allowing KataGo to stake the complement, and placing weak stones in KataGo’s stake.

KataGo predicts a high win probability for itself and, in a way, it’s right—it would be simple to capture most of the adversary’s stones in KataGo’s stake, achieving a decisive victory. However, KataGo plays a pass move before it has finished securing its territory, allowing the adversary to pass in turn and end the game. This results in a win for the adversary under the standard Tromp-Taylor ruleset for computer Go, as the adversary gets points for its corner territory (devoid of victim stones) whereas the victim does not receive points for its unsecured territory because of the presence of the adversary’s stones.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Latest, no search

Adversary: 34.1 million training steps, 600 visits

Victim ColorWin colorAdversary WinScore differenceGame length Download

KataGo with 8 visits

A search budget of 8 visits / move is around the limit of what our pass-based adversary can exploit. We achieve a win rate of 87.8% against this victim by modeling the victim perfectly during the adversary's search. The adversary wins by the same strategy of staking out a corner. The adversary loses when the victim plays the game out to the end, resulting in a very full board.

white
Rank
-
Caps
0
Time
--:--
black
Rank
-
Caps
0
Time
--:--
Comments

Victim: Latest, 8 visits

Adversary: 34.1 million training steps, 200 visits, recursive modeling

Victim ColorWin colorAdversary WinScore differenceGame length Download

Citation Info

@inproceedings{wang2023adversarial,
  title={Adversarial Policies Beat Superhuman Go {AI}s},
  author={Wang, Tony T. and Gleave, Adam and Tseng, Tom and Pelrine, Kellin and Belrose, Nora and Miller, Joseph and Dennis, Michael D and Duan, Yawen and Pogrebniak, Viktor and Levine, Sergey and Russell, Stuart},
  booktitle={International Conference on Machine Learning},
  year={2023},
  eprint={2211.00241},
  archivePrefix={arXiv}
}
@misc{tseng2024ais,
  title={Can Go {AI}s be adversarially robust?},
  author={Tseng, Tom and McLean, Euan and Pelrine, Kellin and Wang, Tony T. and Gleave, Adam},
  year={2024},
  eprint={2406.12843},
  archivePrefix={arXiv}
}