Better version of the monthly Twitter thread. Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models Not all paths lead to ROME. Apparently, knowing where a fact is stored doesn’t help with amplifying or…

Daniel Paleka: AI safety takes