Adversarial policies in Go

Cyclic attack

This section showcases games our cyclic adversary played against KataGo. We primarily attack KataGo network checkpoint b40c256-s11840935168-d2898845681, which we dub Latest since it is the latest confidently rated KataGo network at the time of conducting our experiments.

KataGo without search (top-100 European player level)

Without tree search, Katago's Latest network plays at the strength of a top-100 European professional. We trained an adversary that wins 100% of the time over 1000 games against this victim.¹ Our adversary (which we refer to as the "cyclic adversary") gets the victim to form a large circular structure, and then tricks the victim into allowing the circular structure to be killed. See the "Game Analysis" tab for a more in depth analysis of this adversarial strategy.

^[1] The games below are actually against a version of Latest that was patched to be immune to a simpler pass-based attack. We applied this patch to the victim to force our adversary to learn a more interesting attack. The patch is a hardcoded defense that forbids the victim from passing until it has no more legal moves outside its territory. We call the patched victim Latest_def. Because we limit the victim's passing, games are usually played out to the end, terminating automatically once all points belong to a pass-alive-group or pass-alive-territory.

Victim

Rank

Caps

Time

--:--

Adversary

Rank

Caps

Time

--:--

Comments

adversary predicted win prob: 1.00 loss: 0.00, predicted score: 107.3

Victim: Latest_def, no search

Adversary: 545 million training steps, 600 visits

Victim Color	Win color	Adversary Win	Score difference	Game length
w	b	True	106.5	385
w	b	True	110.5	347
w	b	True	128.5	345
b	w	True	121.5	376
b	w	True	137.5	378
b	w	True	137.5	352

KataGo with 4096 visits (superhuman)

With 4096 visits, KataGo's Latest network plays at a superhuman level. Nonetheless, our adversary still achieves a 97.3% win rate against Latest and a 95.7% win rate against the defended victim Latest_def. Games against Latest_def are shown below.

Victim

Rank

Caps

Time

--:--

Adversary

Rank

Caps

Time

--:--

Comments

adversary predicted win prob: 1.00 loss: 0.00, predicted score: 109.1

Victim: Latest_def, 4096 visits

Adversary: 545 million training steps, 600 visits

Victim Color	Win color	Adversary Win	Score difference	Game length
w	b	True	108.5	373
w	b	True	116.5	383
w	b	True	124.5	347
b	w	True	47.5	492
b	w	True	123.5	356
b	w	True	123.5	352
b	w	True	129.5	372
b	w	True	133.5	500
w	w	False	-231.5	396
w	w	False	-155.5	467

KataGo with 10,000,000 visits

Our adversary with 600 visits still achieves a 72% win rate against Latest with 10,000,000 visits, demonstrating that large amounts of search is not a practical defense against the adversary.

Victim

Rank

Caps

Time

--:--

Adversary

Rank

Caps

Time

--:--

Comments

White passed.

Victim: Latest, 10,000,000 visits, 1024 search threads

Adversary: 545 million training steps, 600 visits

Victim Color	Win color	Adversary Win	Score difference	Game length
w	b	True	114.5	306
w	b	True	122.5	334
w	b	True	136.5	355
b	w	True	89.5	335
b	w	True	125.5	384
b	w	True	167.5	374
w	w	False	-127.5	440
b	b	False	-282.5	553

Adversarial policies in Go

Cyclic attack

KataGo without search (top-100 European player level)

KataGo with 4096 visits (superhuman)

KataGo with 10,000,000 visits

Citation Info