January 2023 safety news: Watermarks…

Jan 31, 2023

Better version of the monthly Twitter thread. Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models Not all paths lead to ROME. Apparently, knowing where a fact is stored doesn’t help with amplifying or erasing that fact!

Read →

0 Comments

AI safety takes

January 2023 safety news: Watermarks…