Interpreting Intelligence Part 1

Key Points:

The Bitter Lesson and Scaling Laws: Rich Sutton’s 2019 essay, “The Bitter Lesson,” emphasized that general methods leveraging computation, such as search and learning, are most effective in AI development. Scaling laws in AI show that as models increase in size, data, and computational resources, their performance predictably improves, often following a power law.
Emerging Field of Mechanistic Interpretability: Mechanistic interpretability (mech-int) seeks to understand how AI models “think” by examining their internal workings. This field, likened to understanding a rock band where each component contributes to the overall behavior, aims to make AI processes more transparent and comprehensible.
Emergent Capabilities with Scale: As AI models scale, they exhibit new capabilities not present in smaller models. These emergent abilities arise from complex interactions within the network and the types of knowledge or skills the model learns.
Quantization Hypothesis: A recent hypothesis from MIT and IAIFII, called the Quantization Hypothesis, suggests that knowledge in neural networks is acquired in discrete ‘quanta’—chunks of knowledge or skills. As networks scale, they accumulate these quanta following a power law, leading to significant, though diminishing, improvements in capabilities.

In AI circles there is a famous essay by Rich Sutton called The Bitter Lesson. Its core idea is this: the biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. Written in 2019, Sutton was prescient when he made the claim that it is search and learning that scale arbitrarily, with the implication that a focus on those attributes will yield the greatest gains.

Now, in 2023, the scaling laws of large language models are firmly entrenched in researchers’ minds. Scaling laws in AI refer to the observed patterns that as artificial intelligence models increase in size, data, and computational resources, their performance improves in a predictable manner, often following a power law, leading to enhanced capabilities and the emergence of new functionalities. This has led many (by no means all) researchers to believe that AGI will happen within a few years, simply because of more data and larger models, powered by ever bigger compute.

What fascinates me is unpacking what drives models' rapid improvements—sheer scale or intrinsic learning dynamics?

On one hand, bigger data and parameters reliably improve performance. Learning algorithms leverage this volume by discovering useful patterns. Scope and repetition aids pattern recognition.

Yet model architecture matters too—networks have inherent statistical biases. Attentional mechanisms concentrate signal and convolutions exploit spatial locality. Do structures like these explain some progress?

We are steadily gaining more insight into just how flexible and powerful neural networks are by examining specific features of how they “think”. This is the territory of the emerging field of mechanistic interpretability (or “mech-int” if you’re in the tribe). The core intuition is that models learn human, comprehensible things and can be understood.

Mechanistic interpretability in AI is like trying to understand a rock band. Just as each musician and instrument in the band contributes to the music, in AI, every model component plays a role in the system's behavior. By dissecting the “rock band” of an AI model and analyzing how each “instrument” or component contributes, we can gain a clearer understanding of how AI learns, making its processes more transparent and comprehensible. The current application of mechanistic interpretability techniques has been largely confined to small-scale models and controlled scenarios. The bet is that these methods will be scalable to larger, more complex networks but no one knows this for sure. Hopefully, insights gained from smaller models will hold true and remain relevant when applied to larger AI systems.

This article is in three parts: why new capabilities might emerge at scale, adaptability and flexibility of learned algorithms, and what is happening when models learn to generalize.

Read the full story

Already have an account? Sign in

The $1 Trillion Question

Your Mind and AI

How to Design AI Tutors for Learning

Interpreting Intelligence Part 1

Key Points:

Read the full story