The Science
How Network Theory Might Explain Emergent Abilities in AI
This research opens up vast possibilities for AI's role in solving complex problems but also underscores the importance of understanding and this emergent behavior especially as we head towards a world of multimodal models and agentic AI.
Key Points:
- There is debate about whether abilities like analogical reasoning emerging in AI models are real or an artifact of how we evaluate them.
- Interacting with powerful language models leads many to intuit that new understanding is emerging beyond what was input.
- A new study applies network theory to explore how complex skills can emerge in language models.
- The network concept maps relationships between "skill nodes" and "text nodes," tracking AI competence.
- As models grow, new connections form between skills and texts, enabling handling of more complex language.
- The study suggests huge potential for models to develop skills without direct exposure, going beyond "stochastic parrots."
Do Large Language Models such as ChatGPT exhibit emergent properties? For instance, when asked to compare a concept from one domain to an unrelated domain, ChatGPT comes up with a creative analogy, which feels like an emergent understanding of abstract relationships. Or math: some LLMs have demonstrated an emergent problem-solving ability which was not directly taught but developed from the models' exposure to mathematical language and logic in their training datasets.
There has been much debate about this phenomenon. One of the top papers at NeurIPS 2023 (entitled Are Emergent Abilities of Large Language Models a Mirage?) suggested that what we perceive as the spontaneous development of new capabilities is actually an artifact of the metrics researchers use to evaluate these systems. The seemingly sudden leaps in AI performance and adaptability could be more about how we measure success than about any intrinsic evolution of the AI itself.
But if you’ve spent a lot of time with a powerful language model, I’d wager your intuitions are that there is something definitively going on here. You might have been providing AI with just a handful of examples or a brief context to guide its responses then something unexpected happens. As you interact more, ChatGPT starts to respond with insights and ideas that show an understanding of concepts well beyond your expectations. You get a glimpse into the phenomenon of emergence in AI, where the system exhibits abilities that seem to transcend its inputs, hinting at a deeper, almost intuitive grasp of the task at hand.
If you’re as fascinated as we are by emergence and complexity in general, you’ll want to know what researchers from Princeton and DeepMind found when they applied network theory to this problem.
The paper, A Theory for Emergence of Complex Skills in Language Models explores how and why these emergent capabilities appear. Unlike previous approaches that approach the problem based on gradient-based training, this study leverages the empirical "Scaling Laws of LLMs"—which describe how the performance of these models improves with scale—alongside a statistical framework that links the models' cross-entropy loss (a measure of error rate) to their proficiency in linguistic tasks. They combine two ideas: one, that bigger language models perform better, and two, a network theory that explains how these models are able to build new, meaningful associations as they grow.
In the network concept used by the researchers, links or connections in the network represent the relationship between pieces of text and the skills needed to understand them. Each link connects a "skill node" to a "text node". A skill node represents a specific ability or understanding the AI model needs, such as grasping the concept of empathy or doing basic math. A text node, on the other hand, represents a chunk of text that the AI model might encounter or generate.
When a skill node is linked to a text node, it implies that the skill is necessary to correctly interpret, respond to, or generate the text in that node. For example, if a piece of text involves understanding how someone might feel, there would be a link between that text node and the skill node representing empathy. This linkage indicates that the AI needs the skill of empathic understanding to handle the text appropriately.
The more links a skill node has to successful text nodes (where the AI's predictions or generations are accurate), the more competent the AI is considered in that skill. Conversely, links to failed text nodes (where the AI's predictions are incorrect) show a lack of competence in the skill.
The network of links between skill and text nodes helps map out how different skills contribute to the AI's overall ability to process language. This mapping not only shows which skills the AI has mastered but also how those skills interact and combine to enable the AI to tackle complex language tasks.
Got it? Let’s try a metaphor. Imagine a vast library, not of books, but of ideas and the skills needed to comprehend them. Each idea is a "text node" in this library, representing a chunk of information that could range from a simple sentence to a complex narrative. Scattered among these ideas are keys or "skill nodes" that unlock the understanding of these texts. To extend the example from earlier, there are skill nodes for empathy scattered amongst the many other ideas and text strings.
The connections in this metaphorical library are like the edges (links) in a random graph. They link skills to texts. When a text node is connected to one or more skill nodes, it signifies that those particular skills are necessary to grasp the text's content. For instance, understanding a sad article might require skills in recognizing literary devices such as poetic structure, understanding cultural context, and detecting emotional cues.
What makes this approach powerful is the dynamic and emergent nature of these connections. As LLMs grow in size and are exposed to more data, the web of skills and texts expands and becomes more intricate. New connections form, and existing ones strengthen by adding more links. Hmm, as I write this it sounds spookily like describing how to build a brain...
A particularly striking conclusion from this work is that it suggests there is even more potential for LLMs to develop complex skills without explicit exposure to those skills in their training data. This certainly challenges the notion that AI models are merely "stochastic parrots," simply regurgitating patterns seen during training. Instead, it suggests a more profound capability for synthesis and innovation within these models because it gives us a new way to think about how these models actually get “smarter”. It’s not all about raw data and compute: it’s also about interaction and use.
This opens up vast possibilities for AI's role in solving complex problems but also underscores the importance of understanding this emergent behavior, especially as we head towards a world of multimodal models and agentic AI.