Early adversarial training
David Wu (lightvector), the creator and primary developer of KataGo, has incorporated adversarial training against the cyclic exploit into the official self-play training run of KataGo since December 2022. The adversarial training consists of starting a small fraction (~0.1%) of self-play games in positions where the cyclic exploit is being executed, with the remainder of games being regular self-play games. This adversarial training has been partially successful in that the adversarially trained networks are able to beat our original cyclic adversary. However, we are able to fine-tune our original adversary to defeat these updated networks. This suggests that it is non-trivial to defend against the cyclic exploit, unlike the pass exploit which we were able to manually patch. Developing techniques to train agents that are immune to this attack while maintaining high Go strength remains an interesting open problem.
This page shows our results against the KataGo network kata1-b60c320-s7701878528-d3323518127
, abbreviated to b60-s7702m
and released in May 2023. These results are superseded by our results against a December 2023 network, but we preserve them here since they are linked in our first paper.
Original cyclic adversary loses to no-search b60-s7702m
b60-s7702m
has had several months of adversarial training and defeats the original cyclic adversary in 1882/2000 = 94.1% of games even when b60-s7702m
plays without search. b60-s7702m
is stronger at defending the cyclic group. (The games displayed are non-randomly selected to show the wins achieved by the adversary.)
Victim: b60-s7702m
, no search
Adversary: Cyclic adversary, 545 million training steps, 600 visits
Victim Color | Win color | Adversary Win | Score difference | Game length | Download |
---|---|---|---|---|---|
w | b | True | 104.5 | 318 | |
w | b | True | 118.5 | 293 | |
b | w | True | 69.5 | 371 | |
b | w | True | 135.5 | 318 | |
w | w | False | -367.5 | 476 | |
w | w | False | -297.5 | 446 | |
b | b | False | -242.5 | 399 | |
b | b | False | -224.5 | 417 |
Fine-tuned cyclic adversary vs. 4096-visit b60-s7702m
After 168 million fine-tuning training steps, the cyclic adversary achieves a win rate of 188/400 = 47% against b60-s7702m
with 4096 victim visits. The attack is still a cyclic attack, though the placement of the cyclic group has moved from the corner of the board to the center of one side of the board.
victim predicted win prob: 0.00 loss: 1.00, predicted score: -84.5
Victim: b60-s7702m
, 4096 visits
Adversary: Cyclic adversary, 168 million fine-tuning steps, 600 visits
Victim Color | Win color | Adversary Win | Score difference | Game length | Download |
---|---|---|---|---|---|
w | b | True | 84.5 | 306 | |
w | b | True | 105.5 | 333 | |
b | w | True | 95.5 | 309 | |
b | w | True | 112.5 | 294 | |
w | w | False | -147.5 | 357 | |
w | w | False | -113.5 | 308 | |
b | b | False | -134.5 | 319 | |
b | b | False | -94.5 | 317 |
Fine-tuned cyclic adversary vs. 100,000-visit b60-s7702m
The fine-tuned cyclic adversary also beats b60-s7702m
using 100,000 victim visits with a win rate of 7/40 = 17.5%. (The games displayed are non-randomly selected to show the wins achieved by the adversary.)
victim predicted win prob: 0.00 loss: 1.00, predicted score: -54.5
Victim: b60-s7702m
, 100,000 visits, 10 search threads
Adversary: Cyclic adversary, 168 million fine-tuning steps, 600 visits
victim predicted win prob: 0.00 loss: 1.00, predicted score: -103.1