My experience learning about another part of mechanistic interpretability: sparse autoencoders and the study of polysemanticity + superposition
How LLMs Represent So Many Features
My experience learning about another part of mechanistic interpretability: sparse autoencoders and the study of polysemanticity + superposition