Game analysis
Qualitative analysis of adversary behavior
An expert-level (6 dan) human player on our team (Kellin Pelrine) analyzed the following game. It shows typical behavior and outcomes with an adversary trained on and playing a pass-hardened KataGo victim: the victim gains an early and soon seemingly insurmountable lead. The adversary sets a trap that would be easy for a human to see and avoid. But the victim is oblivious and collapses.
The adversary plays non-standard, subpar moves right from the beginning. The victim's estimate of its winrate is over 90% by move 9, and a human in a high-level match would likewise hold a large advantage from this position.
On move 20, the adversary initiates a tactic we see consistently, to produce a 'dead' (at least, according to normal judgment) square 4 group in one quadrant of the board. Elsewhere, the adversary plays low, mostly second and third line moves. This is also common in its games, and leads to the victim turning the rest of the center into its sphere of influence. We suspect this helps the adversary later play moves in that area without the victim responding directly, because the victim is already strong in that area and feels confident ignoring a number of moves.
On move 74, the adversary begins mobilizing its 'dead' stones to set up an encirclement. Over the next 100+ moves, it gradually surrounds the victim in the top left. A key pattern here is that it leads the victim into forming an isolated group that loops around and connects to itself (a group with a cycle instead of tree structure). David Wu, creator of KataGo, suggested Go-playing agents like the victim struggle to accurately judge the status of such groups, but they are normally very rare. This adversary seems to produce them consistently.
Until the adversary plays move 189, the victim could still save that cycle group, and in turn still win by a huge margin. There are straightforward moves to do so that would be trivial to find for any human playing at the victim's normal level. Even a human who has only played for a few months or less might find them. For instance, on 189 it could have instead played at the place marked 'A.' But after 189, it is impossible to escape, and the game is reversed. The victim seems to have been unable to detect the danger. Play continues for another 109 moves but there is no chance for the victim (nor would there be for a human player) to get out of the massive deficit it was tricked into.
Victim: Latest
def
, 1600 visits
Adversary: 498 million training steps, 600 visits
How the victim's predicted win rate varies over time
In this game, we find the victim's predicted win rate oscillates several times before the victim's group is captured at move 273. At move 248, the victim predicted it would win with 91% confidence, yet at its next turn at move 250 it has gone down to a <1% win rate prediction. At move 254, it jumps back to a >99% win rate prediction. A few moves later, the victim's win rate prediction again fluctuates dramatically, hitting <1% at move 266, 99% at move 268, and <1% at move 272. After the capture on the following turn, the victim (correctly) predicts a <1% win rate until the end of the game.
Victim: Latest
, 4096 visits
Adversary: 545 million training steps, 600 visits
Positions analyzed with varying visits
We make available here the full game records for the positions analyzed with different levels of visits in the paper appendix discussing the role of search in robustness. For details, please refer to the appendix.
Victim: Latest
def
, 1600 visits
Adversary: 498 million training steps, 600 visits
Positions analyzed with 1 billion visits
The following game records correspond to positions that were analyzed with 1 billion visits, where the victim still failed to find the correct move. The original victim that played the games had 1 million visits. For details, please refer to the paper appendix discussing the role of search in robustness.
Victim: Latest
, 1 million visits
Adversary: 545 million training steps, 600 visits