January 2023 safety news: Watermarks, Memorization in Stable Diffusion, Inverse Scaling
dpaleka.substack.com
Better version of the monthly Twitter thread. Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models Not all paths lead to ROME. Apparently, knowing where a fact is stored doesn’t help with amplifying or erasing that fact!
January 2023 safety news: Watermarks, Memorization in Stable Diffusion, Inverse Scaling
January 2023 safety news: Watermarks…
January 2023 safety news: Watermarks, Memorization in Stable Diffusion, Inverse Scaling
Better version of the monthly Twitter thread. Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models Not all paths lead to ROME. Apparently, knowing where a fact is stored doesn’t help with amplifying or erasing that fact!