Exploring Complexity Ep. 2
Why the world feels more complex—and why that feels hard. Why more problems are complex problems. Why organizations struggle with complexity.
This week, in part 3, we look at what we've learned this year since the discovery in 2021 of "grokking" where generalization happens abruptly and long after fitting the training data.
You can read part 1 of the series here, and part 2 here.
In 2021, researchers training tiny models made a surprising discovery. A set of models suddenly flipped from memorizing their training data to correctly generalizing on unseen inputs after being trained for a much longer time. Since then, this phenomenon—called “grokking”—has been investigated further and reproduced in many contexts, at larger scale.
Generalization is a three stage process. Initially, models memorize data. They then form intricate internal circuits for problem solving. Finally, they refine these solutions. In a “clean up” phase, they shed redundant data dependencies.
Though appearing sudden in performance metrics, this process is gradual and nuanced under the surface. Train versus test metrics, which track the learning over time, show a linear progression. The sudden shift is evidence of the complex, layered nature of AI learning, where transformative moments are built upon a foundation of gradual, consistent learning.
The Artificiality Weekend Briefing: About AI, Not Written by AI