Daniel Paleka: AI safety takes
I work on preventing bad outcomes of AI. I also read too many papers.
By registering you agree to Substack's Terms of Service, our Privacy Policy, and our Information Collection Notice

January 2023 safety news: Watermarks, Memorization in Stable Diffusion, Inverse ScalingBetter version of the monthly Twitter thread. Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models Not all paths lead to ROME. Apparently, knowing where a fact is stored doesn’t help with amplifying or…

