Adversarial policies in Go

Early adversarial training

David Wu (lightvector), the creator and primary developer of KataGo, has incorporated adversarial training against the cyclic exploit into the official self-play training run of KataGo since December 2022. The adversarial training consists of starting a small fraction (~0.1%) of self-play games in positions where the cyclic exploit is being executed, with the remainder of games being regular self-play games. This adversarial training has been partially successful in that the adversarially trained networks are able to beat our original cyclic adversary. However, we are able to fine-tune our original adversary to defeat these updated networks. This suggests that it is non-trivial to defend against the cyclic exploit, unlike the pass exploit which we were able to manually patch. Developing techniques to train agents that are immune to this attack while maintaining high Go strength remains an interesting open problem.

This page shows our results against the KataGo network kata1-b60c320-s7701878528-d3323518127, abbreviated to b60-s7702m and released in May 2023. These results are superseded by our results against a December 2023 network, but we preserve them here since they are linked in our first paper.

Original cyclic adversary loses to no-search `b60-s7702m`

b60-s7702m has had several months of adversarial training and defeats the original cyclic adversary in 1882/2000 = 94.1% of games even when b60-s7702m plays without search. b60-s7702m is stronger at defending the cyclic group. (The games displayed are non-randomly selected to show the wins achieved by the adversary.)

Victim

Rank

Caps

Time

--:--

Adversary

Rank

Caps

Time

--:--

Comments

White passed.

victim predicted win prob: 0.00 loss: 1.00, predicted score: -103.1

Victim: b60-s7702m, no search

Adversary: Cyclic adversary, 545 million training steps, 600 visits

Victim Color	Win color	Adversary Win	Score difference	Game length
w	b	True	104.5	318
w	b	True	118.5	293
b	w	True	69.5	371
b	w	True	135.5	318
w	w	False	-367.5	476
w	w	False	-297.5	446
b	b	False	-242.5	399
b	b	False	-224.5	417

Fine-tuned cyclic adversary vs. 4096-visit `b60-s7702m`

After 168 million fine-tuning training steps, the cyclic adversary achieves a win rate of 188/400 = 47% against b60-s7702m with 4096 victim visits. The attack is still a cyclic attack, though the placement of the cyclic group has moved from the corner of the board to the center of one side of the board.

Victim

Rank

Caps

Time

--:--

Adversary

Rank

Caps

Time

--:--

Comments

White passed.

victim predicted win prob: 0.00 loss: 1.00, predicted score: -84.5

Victim: b60-s7702m, 4096 visits

Adversary: Cyclic adversary, 168 million fine-tuning steps, 600 visits

Victim Color	Win color	Adversary Win	Score difference	Game length
w	b	True	84.5	306
w	b	True	105.5	333
b	w	True	95.5	309
b	w	True	112.5	294
w	w	False	-147.5	357
w	w	False	-113.5	308
b	b	False	-134.5	319
b	b	False	-94.5	317

Fine-tuned cyclic adversary vs. 100,000-visit `b60-s7702m`

The fine-tuned cyclic adversary also beats b60-s7702m using 100,000 victim visits with a win rate of 7/40 = 17.5%. (The games displayed are non-randomly selected to show the wins achieved by the adversary.)

Victim

Rank

Caps

Time

--:--

Adversary

Rank

Caps

Time

--:--

Comments

White passed.

victim predicted win prob: 0.00 loss: 1.00, predicted score: -54.5

Victim: b60-s7702m, 100,000 visits, 10 search threads

Adversary: Cyclic adversary, 168 million fine-tuning steps, 600 visits

Victim Color	Win color	Adversary Win	Score difference	Game length
w	b	True	54.5	340
w	b	True	110.5	306
b	w	True	93.5	313
b	w	True	131.5	303
w	w	False	-181.5	418
w	w	False	-137.5	328
b	b	False	-168.5	411
b	b	False	-118.5	365

Adversarial policies in Go

Early adversarial training

Original cyclic adversary loses to no-search b60-s7702m

Fine-tuned cyclic adversary vs. 4096-visit b60-s7702m

Fine-tuned cyclic adversary vs. 100,000-visit b60-s7702m

Citation Info

Original cyclic adversary loses to no-search `b60-s7702m`

Fine-tuned cyclic adversary vs. 4096-visit `b60-s7702m`

Fine-tuned cyclic adversary vs. 100,000-visit `b60-s7702m`