Size Matters

For the past year, we’ve lived in a world overwhelmed by news of large AI, especially large language models like GPT, the model behind OpenAI’s ChatGPT. The narrative has been that generative AI models become more capable with increasing size and, eventually, these models will get big enough to exceed human intelligence. The general genius of large language models, however, comes at a cost—and that cost may not be worth it in plenty of use cases.

Think of all the times you might want help from a digital partner:

Understanding a specific concept or idea,
Breaking down a particular problem,
Planning for a unique project, or
Weaving together concepts in a novel way,

If you were seeking human help with these tasks, you’d likely seek someone who is an expert in that particular workflow. You would seek a physics expert to explain quantum mechanics or a go-to-market specialist to help prioritize channels for a product launch. Might you reach out to a hypothetical expert at everything? Sure. But, would you pay more for that ‘everything expert’ if the value of their input would be the same as the ‘specific expert’ for the task at hand?

This is one of the key challenges with large AI—will the value of large models make up for the increased cost to build and deploy? Or will people prefer small models that are cheaper and excel in specific workflows?

In contrast to generalized 'everything experts,' small AI presents three key advantages:

Expertise: Training models on specific datasets can create topic experts. For instance, Yale and Harvard’s CS50 chatbot was trained on course materials to make it an expert in computer science specifically for that level course.
Efficiency: Smaller models require less compute to train and deploy. While exact cost comparisons are difficult to come by, some research has shown that small models can cost 30 times than large models when providing responses.
Excellence: Smaller models may allow for increased value-add within workflows. For instance, a small model may excel in drafting content as part of a social media scheduling workflow. Or a small model may excel in providing financial evaluations within a bookkeeping workflow. Or a small model may excel in coding within a development platform.

This week, we’ve seen interesting advances in small models, including Microsoft’s Phi-2, Mistral’s Mixtral, and Google Nano. For instance, Phi-2 was trained on “textbook-quality” data including science, theory of mind, etc. (expertise) which resulted in a model with superior commonsense reasoning, math, and coding (excellence)—all while outperforming some models that are 25x larger (efficiency).

In these early stages of generative AI discovery and development, it’s important to remember that general purpose technologies are not deployed generically. Just as there is no one way to create and deploy a website, there will not be one way to create and deploy generative AI. We are particularly focused on the expertise, efficiency, and excellence advantages of small models because they may be the best way to weave AI into our daily lives—especially in use cases that we want to access on-device in our mobile-first world.

Note: Model cost and AI workflows are part of our ongoing research agenda for Artificiality Pro. Check out our latest update here and please contact us with any questions about Artificiality Pro individual or organization-wide subscriptions.

This Week from Artificiality:

Artificiality Pro: December Update

In our Artificiality Pro update for December, we covered several key industry updates in AI, including developments from companies like OpenAI, Google, Apple, and Anthropic. We also introduced a new topic we're exploring around mechanistic interpretability in AI models. Additionally, we discussed the relationship between memory and cost margins for large language models, explaining why memory is the “enemy of margins.” We previewed an upcoming public webinar on talking with teens about AI.

Mechanistic Interpretability & Memory vs. Margins

In this subscriber-only episode, we provide updates from our Artificiality Pro presentation, including key developments in mechanistic interpretability for understanding AI models and considerations around the costs of large language models: aka memory vs margins. We also highlight an upcoming webinar we are hosting on discussing AI with teenagers.

The Paradox of Expertise in the AI Age

A crucial tension exists between automating systems to not require human intervention, like self-driving cars, and the need for human expertise when AI fails unexpectedly.

Stephen Fleming: Consciousness & AI

In this episode, we speak with cognitive neuroscientist Stephen Fleming about theories of consciousness and how they relate to artificial intelligence. We discuss key concepts like global workspace theory, higher order theories, computational functionalism, and how neuroscience research on consciousness in humans can inform our understanding of whether machines may ever achieve consciousness. In particular, we talk with Steve about a recent research paper, Consciousness in Artificial Intelligence, which he co-authored with Patrick Butlin, Robert Long, Yoshua Bengio, and several others.

Interpreting Intelligence Part 1: Why new capabilities might emerge at scale

Part 1 of three spooky things we’ve learned in 2023 about how neural networks learn.

In AI circles there is a famous essay by Rich Sutton called The Bitter Lesson. Its core idea is this: the biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. Written in 2019, Sutton was prescient when he made the claim that it is search and learning that scale arbitrarily, with the implication that a focus on those attributes will yield the greatest gains.

Read on for more...

Integrated Intelligence

Our obsession with intelligence: AI that promotes collective intelligence, not collective stupidity.

Imagine a scribe in ancient Sumer etching symbols onto a clay tablet with a newly designed stylus. This simple act—transferring thought to tangible medium—was revolutionary. By externalizing ideas, the Sumerians didn't just record information, they transformed the very concept of intelligence. The stylus and tablet, in essence, became extensions of the scribe's mind, broadening the scope of cognition and recalibrating what knowledge means.

This evocative image parallels a groundbreaking idea by philosophers Andy Clark and David Chalmers from the late 20th century—the concept of the extended mind. They posited that our cognition doesn't halt at the boundaries of our skulls. Instead it spills over, intertwining with external objects or tools.

We hold one idea above most others: aspirational AI should serve as a mind for our minds.

Read on for more...

The $1 Trillion Question

Your Mind and AI

How to Design AI Tutors for Learning

Dave Edwards