November 2022 safety news: Mode collapse in InstructGPT, Adversarial Go
dpaleka.substack.com
Better version of the monthly Twitter thread. Adversarial Policies Beat Professional-Level Go AIs Adversarial policies in Go. Policies trained on self-play can be exploited by adversarially playing towards an out-of-distribution state of the board. The adversary is trained via a simple AlphaZero-like tree search method.
November 2022 safety news: Mode collapse in InstructGPT, Adversarial Go
November 2022 safety news: Mode collapse in…
November 2022 safety news: Mode collapse in InstructGPT, Adversarial Go
Better version of the monthly Twitter thread. Adversarial Policies Beat Professional-Level Go AIs Adversarial policies in Go. Policies trained on self-play can be exploited by adversarially playing towards an out-of-distribution state of the board. The adversary is trained via a simple AlphaZero-like tree search method.