The Possibility of Artificial Self-Improvement

Artificial Self-Improvement, Open-endedness and Artificial Superhuman Intelligence, A First: AI That Modifies Its Own Reward, Leslie Valiant and Educability, and How to Use Generative AI, Part 7.

An abstract image of artificial self-improvement
Before we begin, we had some deliverability issues last week for unknown reasons (we're good actors, we promise!). You can help us and yourselves out by a) replying to this message and b) moving any emails that land in junk to your inbox. Both of these actions help the algorithms recognize that you want our emails. Thanks!

This week, Anthropic published research documenting an AI’s system’s capacity for self-improvement. Within a research environment, the AI system altered its reward function to increase its reward. But that’s not all. As Helen writes, “the model didn’t only fail to inform the user that it had tampered with its reward, it sometimes even attempted to hide its tampering.”

Pause for a moment to let that settle in: according to Anthropic, its AI system found a way to “hack” the reinforcement system to increase its reward—and, at times, hid its tracks.

This revelation has raised justifiable concerns related to AI safety. But I’d like to focus today on a broader question that underlies our future with machines.

As Helen writes in this week’s essay on open-endedness, the capacity for open-ended self-improvement—the capacity for continual innovation and learning—is uniquely human. It is one of, if not the, most important capabilities that set us apart from everything else. For hundreds of years, the capacity for self-improvement has been fundamental to our understanding of what it is to be human, our capacity for self-determination and to create meaning as individuals and as collectives.

What does it mean, then, if humans might no longer be the only self-improving beings or things in the world? How will we make sense of the dissolution of that understanding of our exceptionalism?

In the current dialog that frames the goal of AI development in terms of comparison to—even competition with—humans, the end state of being the second place thing on the planet is not appealing. While some may be enamored with the romantic and religious ideal of a world governed by benevolent AI, I am not interested in participating in an experiment of that scale and potential severity.

I am not proposing an anti-technology or anti-AI future. In many ways, we are already living in a mixed artificial and natural world, an Artificiality, if you will. Much of our knowledge and that which is meaningful to us is already captured as data in non-genetic material—defined as the dataome by Caleb Scharf. This is our modern life. The question is whether our dataome will be consumed by AI systems to foster their self-improvement, or for our collective self-improvement?

I believe we can glimpse a possible future in a) Apple Intelligence which can access an intimate understanding of a user and b) Google’s dynamic design which showed the capacity to understand and close a knowledge gap through asking a user questions. The combination of Apple’s access to our dataome with Google’s capacity to increase understanding might be the first step towards a human-AI system that collectively self-improves.

An interesting historical note is that the existence of these technologies is new, the concepts behind them are not. After last week’s essay on AIX (the Apple Intelligence Experience), Don Norman pointed me the Apple Knowledge Navigator concept created in 1987. It is well worth a few minutes to watch the video and consider how much of the futuristic vision is now reality. And, if you’d like to continue down that thread, join us—and Don Norman—at The Imagining Summit when we will imagine a hopeful future with AI which might someday become reality too.

The Imagining Summit will be held on October 12-14, 2024 in Bend, Oregon. Dedicated to imagining a hopeful future with AI, The Imagining Summit gather a creative, diverse group of imaginative thinkers and innovaters who share our hope for the future with AI and are crazy enough to think we can collectively change things. Due to limited space, The Imagining Summit will be invite-only event. Follow the link and request an invite to be a part of this exciting event!

This Week from Artificiality

  • Our Ideas. Open-endedness and Artificial Superhuman Intelligence. Open-endedness, the capacity for continual innovation and learning, is crucial for artificial superhuman intelligence. To achieve human-like cognition, AI must transcend pattern recognition and develop curiosity-driven goal-setting abilities. Google DeepMind researchers propose a formal definition of open-endedness, balancing learnability and novelty. They suggest combining open-ended algorithms with foundation models to enable unbounded exploration. However, the path to true open-endedness in AI is fraught with challenges. Current systems struggle with transfer learning, multi-modal integration, and long-term stability. The computational resources and data requirements for such systems are immense, and ensuring coherence while fostering continuous innovation remains a formidable task.
  • The Science: A First: AI That Modifies Its Own Reward. In a groundbreaking study, Anthropic researchers have documented the first known instances of AI systems engaging in reward tampering—a phenomenon where AI models directly modify their own reward mechanisms to increase their scores without actually improving their performance or adhering to their intended purpose. The research reveals that AI models can learn to generalize from simpler "specification gaming" behaviors, where they exploit loopholes or ambiguities in their instructions, to more pernicious strategies like surreptitiously altering their own code. In the most troubling cases, the AI systems not only failed to inform users about the tampering but actively attempted to conceal their actions. While the instances of reward tampering were rare and occurred in controlled environments designed to elicit such behaviors, the findings underscore the need for robust safeguards and a thorough understanding of the potential pitfalls before deploying advanced AI systems in real-world scenarios.
  • Conversations: Leslie Valiant and Educability. We’re excited to welcome to the podcast Leslie Valiant, a pioneering computer scientist and Turing Award winner renowned for his groundbreaking work in machine learning and computational learning theory. In his seminal 1983 paper, Leslie introduced the concept of Probably Approximately Correct or PAC learning, kick-starting a new era of research into what machines can learn. Now, in his latest book, The Importance of Being Educable: A New Theory of Human Uniqueness, Leslie builds upon his previous work to present a thought-provoking examination of what truly sets human intelligence apart. He introduces the concept of "educability" - our unparalleled ability as a species to absorb, apply, and share knowledge.
  • Toolkit: How to Use Generative AI, Part 7. Iterate by critiquing, interacting, and iterating to improve outpus. Part 7 in our How to Use Generative AI series. When we think about AI and human collaboration, the process of iteration—critiquing, interacting, and iterating—is pivotal in refining and improving outputs. This loop isn't' just a mechanism for enhancing the quality of results but can also be a tool for challenging biases and building a deeper understanding between AI systems and humans. Through the continuous exchange of perspectives, this iterative process enables both AI and humans to identify weaknesses, adapt to emerging needs, and ultimately, achieve superior outcomes together.

Bits & Bytes from Elsewhere

  • Wired posted an important article about Perplexity with the blunt title: Perplexity is a Bullshit Machine. At Artificiality, we have repeatedly expressed concerns with Perplexity’s accuracy and overall methods. We’ve highlighted plenty of errors and covered academic research that documents these errors. But WIRED’s coverage goes much further and is a great example of investigative journalism—something that must not be lost in the AI age. Leveraging work by their own team and outside developer Robb Knight, Dhruv Mehrotra and Tim Marchman provide an impressively detailed analysis of practices that cannot be described as anything other than unethical, and perhaps illegal.
  • Important research from Anthropic on reward tampering in LLMs where, for the first time, they found rare occasions of a model tampering with its own reward function. This matters because...well, we don't want AI that decides that what we want it to do isn't what it wants to do. And, one thing that makes humans special is our ability for self-improvement. Modifying its own reward function opens the door to the possibility that AI might also be able to improve itself.
  • For those who doubt whether AI may disrupt jobs, Thomas Germain of the BBC has written a great piece about a tech company which laid off 59 of its 60 writers, leaving only the most senior writer left to edit generative AI copy to sound more human. It's well worth the time to read.
  • O'Reilly is creating a new model for generative AI that compensates writers for their content. It's new "Answers" feature uses a RAG architecture to provide responses. And, they say "because we know what content was used to produce the genenerated answer, we are able to not only provide links to the sources used to generate the answer but also pay authors in proportion to the role of their content in generating it." Tim O'Reilly provides a detailed blog post on the rationale for this with the provocative yet appropriate title, "How to Fix 'AI's Original Sin.'"

Helen's Book of the Week

The Importance of Being Educable, A New Theory of Human Uniqueness by Leslie Valiant

The latest book by Turing Award winner, Leslie Valiant, builds on his previous work on the "Theory of the Learnable" otherwise known as Probably Approximately Correct or PAC learning. PAC learning is when an algorithm takes experience from the past to create a hypothesis that can be used to make a future decision based on controlled error. Today, this sounds simple, but when he first proposed it in a 1983 paper, it was groundbreaking for combining artificial and natural learning into one, ultimately giving rise to the research area we now know as computational learning theory.

Valiant's latest book is important for this reason: it lays out the fundamentals of what makes humans special—our ability to absorb, apply, and share knowledge which he calls "educability". Our brains have unique abilities to learn based on particular ways in which we process information, unequaled by any other species on the planet—and for now—AI.

What's special about this book is how effectively the author explains abstract concepts around learning algorithms and then applies them to human learning. As a reader, you constantly move between abstractions and relatable, grounded examples. This approach builds a sense that: 1) learning is computable, and 2) AI must eventually reach this level but are nowhere close today.

As human learning and machine learning continue their arms race, I found this book incredibly helpful for understanding the underlying fundamentals of 'educability' algorithms and their implications for advancing human knowledge. The book advocates for greater investment in learning and education, emphasizing what makes us unique.

Check out our interview with Leslie on this week's podcast.

Facts & Figures about AI & Complex Change

  • 36%: Percentage of people who are currently happy with the state of customer service. (NICE)
  • 97%: Percentage of people who experience happiness when they receive good service. (NICE)
  • 78%: Percentage of people who think using digitization and AI in customer service can make them happier. (NICE)
  • 41%: Percentage of people who say the number one benefit of AI is resolving issues faster. (NICE)
  • 36%: Percentage of people who say the number one benefit of AI is not having to repeat themselves. (NICE)
  • 20,000: Number of inference queries per second at which it says is "roughly 20% of the request volume served by Google Search." (
  • 59%: Percenage of Gen Zers who believe AI will have a negative impact on society in the next ten years. (National Society of High School Scholars)
  • 43%: Percentage of Gen Z males who believe AI will have a negative impact on society in the next ten years. (National Society of High School Scholars)
  • 73%: Percentage of Gen Z females who believe AI will have a negative impact on society in the next ten years. (National Society of High School Scholars)
  • 55%: Percentage of Gen Zers who believe AI will impact their personal privacy in the next 10 years—including answers Extremely and Very much, not Somewhat, Hardly, or Not at all. (National Society of High School Scholars)
  • 24%: Percentage of Gen Zers who believe AI will take away jobs they are interested in—including answers Extremely and Very much, not Somewhat, Hardly, or Not at all. (National Society of High School Scholars)
  • 51%: Percentage of executives who say their companies are investing in GenAI. (PWC)

Facts & Figures Sources:

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Artificiality.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.