February 2023 safety news: Unspeakable tokens, Bing/Sydney, Pretraining with human feedback
dpaleka.substack.com
Better version of the monthly Twitter thread. More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models Security flaws in LMs with API calling capabilities. Prompt injections are actually dangerous when the user doesn't control all the context.
February 2023 safety news: Unspeakable tokens, Bing/Sydney, Pretraining with human feedback
February 2023 safety news: Unspeakable…
February 2023 safety news: Unspeakable tokens, Bing/Sydney, Pretraining with human feedback
Better version of the monthly Twitter thread. More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models Security flaws in LMs with API calling capabilities. Prompt injections are actually dangerous when the user doesn't control all the context.