Home
We discovered simple adversarial strategies that beat superhuman Go AIs, and find that adding defenses helps but does not eliminate the problem. Our cyclic adversary beats the state-of-the-art KataGo AI more than 97% of the time at superhuman settings. This strategy is simple enough to be replicated by an amateur human player and transfers to other superhuman Go AIs. We find that although positional and iterated adversarial training protect against the original cyclic adversary, they can still be exploited by new adversaries. We also train a new Go AI based on vision transformers rather than convolutional neural networks, only to find it remains vulnerable to the cyclic attack.
The original cyclic adversary (below, playing as white) works by forming an inside group of stones that the victim Go AI surrounds. The adversary then re-encircles this group. Despite numerous opportunities to save its group, the victim fails to see the danger and remains confident of victory, even many moves after it has irreversibly lost. For more details on this attack, see our blog post, ICML 2023 presentation, or paper.
Since Go AIs were never explicitly designed with robustness in mind, we wondered whether simple defenses could make KataGo robust. We test three natural defenses (illustrated below): positional adversarial training on hand-constructed board positions, iterated adversarial training against successively stronger adversaries, and changing the network architecture to a vision transformer.
Variants of the cyclic attack continue to beat all three defenses. Furthermore, we discover two qualitatively new adversarial strategies. First, the positional adversarially trained agent is vulnerable to a "gift" attack that sets up a "sending-two-receiving-one" situation where, for no valid reason, the victim gifts the adversary two stones and then needs to capture one back. Second, the iterated adversarially trained agent is vulnerable to an "atari" attack that induces the victim to set up a large cyclic group incorporating "bamboo joints" which the adversary then threatens to split.
Our results suggest that achieving robustness is challenging, even in narrow domains such as Go. However, we did notice one positive signal: defending against any fixed static attack was quick and easy. We think it might be possible to leverage this property to build a working defense both in Go and other settings. For more information on defenses, check out our latest blog post or paper.