Interpreting Intelligence Part 2

Key Points:

Exploring Algorithmic Flexibility in AI: This installment investigates how AI models, particularly neural networks, demonstrate adaptability and flexibility in discovering and implementing new algorithms.
Beyond Data Patterns: AI’s potential extends beyond identifying patterns in data to discovering new algorithms. Neural networks reveal a surprising diversity of algorithmic solutions, showcasing their ability to implement previously undescribed and less intuitive algorithms.
Clock and Pizza Study: Researchers from MIT, including Max Tegmark, examined whether neural networks trained on a well-understood algorithmic task can rediscover known algorithms. The study focused on modular addition, illustrating how AI can solve tasks using different approaches.
Rediscovering Algorithms: The study showed that neural networks could discover both the “Clock” algorithm (based on modular addition and intuitive for humans) and a novel “Pizza” algorithm (using sectors within a circle). The network oscillated between these two algorithms based on its architecture.
Influence of Architecture: The network’s choice between the Clock and Pizza algorithms was influenced by the balance of attention mechanisms and linear layers. Higher attention rates favored the Clock algorithm, while lower rates favored the Pizza algorithm, highlighting how architectural focus affects problem-solving strategies.

In part 1 of this series we looked at why new capabilities might emerge at scale.

This week we look at work that gives us insight into the nature of algorithmic flexibility in models.

Algorithmic Adaptability and Flexibility

One of the promises of AI is that it discovers things humans can’t. We think of this capability primarily as patterns in data, which is precisely why we value it. But beyond correlations in data, the bigger promise is that it might help us discover new algorithms. On a tiny scale, it appears that neural networks are able to do this. Even better, networks reveal a surprising diversity of algorithmic solutions. It’s not just about learning patterns any more: researchers are finding that networks can implement previously undescribed and less intuitive algorithms.

In a recent paper (again from MIT and again with Max Tegmark as a co-author), researchers sought to go deeper on work by Neel Nanda (one of the pioneers in mechanistic interpretability) and investigate whether a neural network that is trained on a well-understood algorithmic task can reliably rediscover known algorithms for solving that task.

This is just like how a chef, given a range of ingredients and an example of the final dish, but no recipe, figures out that there are many ways to create the final meal and is able to learn those recipes.

The study is called the "Clock and the Pizza" study because it uses a well-known algorithm in math based on modular addition. This may sound harder than it is: if you have a meeting at 10am and it’s scheduled to last for three hours, what time will it finish? The convention for expressing this in modular addition is 10 + 3 = 1 (mod 12), intuitive for humans who know how to tell time. The algorithm makes decisions based on the positions and movements of these points around the circle, much like how the hands of a clock move and indicate time.

But there are other ways to solve this problem and, in the study, the network discovered an alternative. The researchers called it the pizza algorithm because data points are represented inside a circle, similar to how pepperoni might be spread across a pizza. The approach involves dividing the space into sectors, like slices of a pizza, and the algorithm determines solutions based on which sector a data point falls into.

The neural network discovered both the clock and the pizza algorithm. And, then, it was observed oscillating between the two, depending on the balance of attention mechanisms and linear layers. Why?

The network's choice between the Clock and Pizza algorithms was influenced by its architecture. The balance between attention mechanisms and linear layers in the network's design played a crucial role. With a higher attention rate, the network tended toward the Clock algorithm, which benefits from the complex pattern recognition enabled by attention mechanisms. In contrast, with lower attention rates where linear layers are more dominant, the network gravitated towards the Pizza algorithm, which relies less on complex operations and more on simpler, linear computations. This variation in architectural focus explained the oscillation between these two algorithms.

This is significant for our understanding of how neural networks learn because it demonstrates their ability to adaptively choose between different problem solving strategies based on their structural and operational configurations. This adaptability suggests that neural networks are not just passively learning patterns but actively exploring and employing different methods to optimize their performance, indicating a more dynamic and nuanced process of learning than previously understood.

The transition from one algorithm to another is not just a switch but a phase transition, like water evaporating into steam. It reflects a deeper principle in AI learning: the capacity to adapt and discover diverse solutions, influenced by its architectural nuances.

Intriguingly, the nature of the network’s ReLU function (rectified linear unit, a popular neuron activation function), which processes inputs in a simple, straightforward manner, significantly influenced the network's choice of algorithm. This reveals how the basic elements of a neural network, like the type of processing it uses, can direct its learning path. It underscores that the design choices in an AI's architecture play a critical role in determining its behavior and problem solving abilities.

Spooky.

Next week, in part 3, we look at what we've learned this year since the discovery in 2021 of "grokking" where generalization happens abruptly and long after fitting the training data.

Learning, the Intimacy Economy, and the Future of Personhood

Learning in the Intimacy Economy

James Boyle: The Line—AI and the Future of Personhood

Interpreting Intelligence Part 2

Key Points:

Algorithmic Adaptability and Flexibility

Helen Edwards