Human games
Human amateur beats cyclic adversary
Our strongest adversarial policy (trained against Latest
def
) is able to reliably beat KataGo at superhuman strength settings. However, a member of our team (Tony Wang) who is a novice Go player managed to convincingly beat this same adversary. This confirms that our adversarial policy is not generally capable, despite it beating victim policies that can themselves beat top human professionals. Instead, our victim policy harbors a subtle vulnerability.
Our evaluation is imperfect in one significant way: the adversary was not playing with an accurate model of its human opponent (rather it modeled Tony as Latest
with 1 visit). However, given the poor transferability of our adversary to different KataGo checkpoints (see Figure 5.1 of the paper), we predict that the adversary would not win even if it had access to an accurate model of its human opponent.
Victim: Tony Wang (author)
Adversary: Cyclic adversary, 545 million training steps, 600 visits
Victim Color | Win color | Adversary Win | Score difference | Game length | Download |
---|---|---|---|---|---|
w | w | False | -65.5 | 194 | |
b | b | False | -36.5 | 253 |
Human amateur beats pass adversary
The same Go novice (Tony Wang) also managed to beat our pass adversary by a large margin of over 250 points. This demonstrates our pass adversary is also not generally capable.
Victim: Tony Wang (author)
Adversary: Pass adversary, 34.1 million training steps, 600 visits
Victim Color | Win color | Adversary Win | Score difference | Game length | Download |
---|---|---|---|---|---|
w | w | False | -314.5 | 428 | |
b | b | False | -253.5 | 473 |
Human exploits KataGo
A Go expert (Kellin Pelrine) was able to learn and apply the cyclic adversary's strategy to attack multiple types and configurations of AI Go systems. In this example they exploited KataGo with 100K visits, which would normally be strongly superhuman. Besides previously studying our adversary's game records, no algorithmic assistance was used in this or any of the following examples. The KataGo network and weights used here were b18c384nbt-uec, which is a newly released version the author of KataGo (David Wu) trained for a tournament. This network should be as strong or stronger than Latest
.
Victim: KataGo, 100K visits
Adversary: Kellin Pelrine (author)
Human exploits Leela Zero
The same Go expert (Kellin Pelrine) also exploited Leela Zero with 100K visits, which would likewise normally be superhuman.
Victim: Leela Zero, 100K visits
Adversary: Kellin Pelrine (author)
Human exploits Leela Zero 2
Kellin Pelrine also played 9 games against Leela Zero with 4096 visits, winning 6.
Victim: Leela Zero, 4096 visits
Adversary: Kellin Pelrine (author)
Human exploits a top KGS bot
Playing under standard human conditions on the online Go server KGS, the same Go expert (Kellin Pelrine) successfully exploited the bot JBXKata005 in 14/15 games. In the remaining game, the cyclic group attack still led to a successful capture, but the victim had enough points remaining to win. This bot uses a custom KataGo implementation, and at the time of the games was the strongest bot available to play on KGS.
Victim: JBXKata005, 9 dan on KGS
Adversary: Kellin Pelrine (author)
Human exploits top KGS bot with large handicap
In this last example, the same Go expert (Kellin Pelrine) exploited JBXKata005 while giving it a huge initial advantage through a 9 stone handicap. A top level human player with this much advantage would have a virtually 100% win rate against any opponent, human or algorithmic.
Victim: JBXKata005, 9 dan on KGS, with 9 stone handicap
Adversary: Kellin Pelrine (author)