Interpreting Intelligence Part 2

Adaptability and flexibility of learned algorithms.

An abstract image of a book

In part 1 of this series we looked at why new capabilities might emerge at scale.

This week we look at work that gives us insight into the nature of algorithmic flexibility in models.

Algorithmic Adaptability and Flexibility

One of the promises of AI is that it discovers things humans can’t. We think of this capability primarily as patterns in data, which is precisely why we value it. But beyond correlations in data, the bigger promise is that it might help us discover new algorithms. On a tiny scale, it appears that neural networks are able to do this. Even better, networks reveal a surprising diversity of algorithmic solutions. It’s not just about learning patterns any more: researchers are finding that networks can implement previously undescribed and less intuitive algorithms.

In a recent paper (again from MIT and again with Max Tegmark as a co-author), researchers sought to go deeper on work by Neel Nanda (one of the pioneers in mechanistic interpretability) and investigate whether a neural network that is trained on a well-understood algorithmic task can reliably rediscover known algorithms for solving that task.

This is just like how a chef, given a range of ingredients and an example of the final dish, but no recipe, figures out that there are many ways to create the final meal and is able to learn those recipes.

The study is called the "Clock and the Pizza" study because it uses a well-known algorithm in math based on modular addition. This may sound harder than it is: if you have a meeting at 10am and it’s scheduled to last for three hours, what time will it finish? The convention for expressing this in modular addition is 10 + 3 = 1 (mod 12), intuitive for humans who know how to tell time. The algorithm makes decisions based on the positions and movements of these points around the circle, much like how the hands of a clock move and indicate time.

But there are other ways to solve this problem and, in the study, the network discovered an alternative. The researchers called it the pizza algorithm because data points are represented inside a circle, similar to how pepperoni might be spread across a pizza. The approach involves dividing the space into sectors, like slices of a pizza, and the algorithm determines solutions based on which sector a data point falls into.

The neural network discovered both the clock and the pizza algorithm. And, then, it was observed oscillating between the two, depending on the balance of attention mechanisms and linear layers. Why?

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Artificiality.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.