This week we are leaning into multiple metaphors: AI as a mirror, UX design as a window or room, and life as information. Plus: read about Michael Levin's upcoming presentation at the Imagining Summit, Helen's Book of the Week, and our upcoming events.
It’s easy to fall prey to the design illusion that because LLMs look sleek, they must be well-designed. But aesthetics alone do not equal design. As Steve Jobs once said, “Design is not just what it looks and feels like. Design is how it works.”
It’s curious that these two papers, tackling such similar ideas, came out at the same time. Is this coincidence, or does it tell us something about where the study of life and intelligence is heading?
The Agentic AI Reoport, Graph RAG, Inside Anthropic's Claude, The Agentic Web, Karaitiana Taiuru and Indigenous AI, Weaving with Generative AI, and more!
We have a content-heavy publication for your holiday weekend (in the US, that is):
The Agentic AI report, built off of our webinar on the same topic. Be sure to check out our framework for understanding and evaluating the complexity of AI agents. And download the pdf version of the report (at the bottom of the report page).
A fascinating science review of research from Anthropic which unveils some of the inner workings of Claude.
A personal ideas piece reflecting on The Agentic Web and what it might mean to be human in a world which is dominated by content by machines, for machines.
A podcast interview with Māori Data Sovereignty expert, Karaitiana Taiuru about indigenous peoples and AI.
Part 5 in our How to Use Generative AI series on Weave: Levering the Technology's Combinatorial Power.
And also Helen's Book of the Week, Bits & Bytes from Elsewhere, and Facts & Figures about AI & Complex Change.
Phew!
A Few Dates for Your Calendars
11 June 2024 at 2pm PT: Our June research briefing on our latest insights and best practices for collaborating with generative AI.
27 June 2024 at 12pm PT: An introduction to Generative AI at The Haven in Bend Oregon. Let us know if you're local and would like to join us.
9 July 2024 at 2pm PT: Our July research briefing on learning and AI.
13 October 2024: Big news! Pencil in this date for the first Artificiality Summit in Bend, Oregon. Join us for our first in-person event and stay the week for the High Desert Innovation Week and the Bend Venture Conference. Please email us if you are interested in attending and/or contributing in any way.
And, while I have your attention, don't miss our new Services page which covers how we help your organization unlock generative AI's potential. Yes, we've been doing this work for several years for organizations around the globe. What's new is the publicity around our services—check it out and give us a call!
This Week from Artificiality
Our Research: The Agentic AI (aka AI Agents) Report. The report version of our latest research webinar on agentic AI which we define as AI systems that can perceive, reason, and act with varying complexity to extend the human mind beyond our current experience. The report covers the key components of perception, reasoning, and action, AI agent personas representing different levels of complexity, agent structures or architectures, and CX to AX—the shift from customer experience to agent experience. Read through the report, download a pdf, watch the webinar replay, and contact us with any questions.
Our Research: Graph RAG: Querying Enterprise Data with LLMs. Knowledge management and enterprise search are notoriously challenging endeavors. For years, people have yearned for a "Google for the enterprise" or an "Alexa, tell me sales from last quarter"—style capability. The primary tools for enhancing data access and knowledge management have been centered around the development of knowledge graphs. However, this is an arduous task in itself. Now, with the emergence of RAG (retrieval augmented generation), it's possible to bypass fine-tuning and directly query a corpus of information. Graph RAG doesn't necessarily replace knowledge graphs but can serve as a complementary tool, especially in scenarios where rapid, scalable, and dynamic summarization of large unstructured datasets is required.
The Science: What Anthropic Finds by Mapping Claude's Mind. In a new study, researchers at Anthropic have begun to show the inner workings of Claude 3.0 Sonnet, a state-of-the-art AI language model. By applying a technique called "dictionary learning" at an unprecedented scale, they've mapped out millions of "features"—patterns of neuron activations representing concepts—that underlie the model's behaviors. The research constitutes a major milestone in AI interpretability. It's a first glimpse into the mind of an alien intelligence, one that we've created but are only beginning to understand.
Our Ideas: The Agentic Web. The future of the internet is evolving into an Agentic Web, dominated by AI-generated content created for machines rather than humans, transforming how we interact and consume information online. But, this ideas piece is for the people, the humans, the readers with biological eyes and minds. Only you, my fellow humans, can understand why I wrote this piece and why these memories mean so much to me.
Conversations: Karaitiana Taiuru: Indigenous AI. An interview with Karaitiana Taiuru a leading authority and a highly accomplished visionary Māori technology ethicist specialising in Māori rights with AI, Māori Data Sovereignty and Governance with emerging digital technologies and biological sciences. In our conversation, Karaitiana shares his vision for incorporating Māori culture, values and knowledge into the development of AI technologies in a way that respects data sovereignty.
Toolkit: How to Use Generative AI, Part 5: Weave. Large Language Models are incredible in their ability to synthesize and recombine vast stores of knowledge, serving as powerful tools for creative and analytical tasks alike. Of all their skills, we think this one is perhaps the most powerful for most users. By weaving together disparate concepts into novel combinations, these models demonstrate an unparalleled capacity to "innovate", drawing from their extensive training data that spans virtually every domain of human knowledge. This capability allows LLMs to make connections across disciplines that might elude human observers, often bringing to light associations and parallels that spark new ideas or solutions to complex problems.
Bits & Bytes from Elsewhere
Stanford's Center for Foundation Models released an update to its Foundation Model Transparency Index. We covered the initial release of this index in October 2023, noting that "to trust generative AI in your daily workflow, you need to know how it works, when to trust it, and when not to trust it." Overall, transparency appears to have increased although it isn't a true apples-to-apples comparison since the developers and models included are not the same. That said, the mean score improved by 21 points to 58 (out of 100) and the top score increased to 85 from 54.
Generative AI influencers were abuzz about research comparing human vs LLM accuracy in predicting company earnings. As a former equity research analyst, this caught my attention! The research asked an interesting question: given anonymized financial statements, how well could an LLM forecast future earnings as compared to published research reports? The problem is that, even though the question is intellectually interesting, its conclusions were easily misinterpreted. Yes, the LLMs outperformed. But, the reality is that sell-side earnings forecasts aren't a good benchmark. Plenty of analysts forecast earnings that are not their actual best guess (i.e., to help a buy-rated company meet-and-beat expectations). And, to be completely honest, sell-side analysts are not the best at forecasting earnings—that award goes to buy-side analysts who work at hedge funds. I go through all of this to make this point: Generative AI is a general purpose technology and it's important to stay-in-your-lane when drawing conclusions. My mental model is based on a time when I worked for Mary Meeker, covering the internet in the late 90s. Mary was the "Queen of the Net," according to Barron's, and I think she understood it better than almost anyone, if not everyone. But, she was very careful to not overextend her expertise and collaborated with other industry analysts when she wanted to talk about the internet's effect. I did the same when covering clean energy in the mid 2000s—I worked with the utilities analyst to understand solar & utilities and the auto analyst to understand ethanol and autos. So, remember: human expertise about generative AI has more boundaries than generative AI's expertise itself.
Latimer released research on a new framework for measuring and detection bias in LLMs. It’s important because it introduces a new a new metric called Bias Intelligence Quotient (BiQ) which detects, measures, and mitigates racial bias in LLMs without reliance on demographic details. Unlike traditional methods, which often rely on demographic data to identify biases, BiQ operates independently of such data, making it a more flexible and effective tool. BiQ evaluates multiple factors to provide a comprehensive bias score, including: diversity of training data, context sensitivity, and sentiment bias. Something that caught our eye in this research is the idea that Latimer can be used alongside other models, almost as a bias QA system. If you already have LLMs deployed in your organization, you may want to consider adding Latimer into the mix as an additional RAG tool for increasing fairness in overall AI responses.
AI Safety is an important topic and this book aims to lay out in detail the theoretical and philosophical arguments for AI as something humans will not be able to control. It is a highly analytical book but remains quite readable, if not enjoyable (if you overlook the fact that we are all going to die if he's correct).
Making AI safe is not trivial. Yampolskiy takes the reader through the ugly truth hidden in the math of AI, the fractal nature of safety fixes, the asymmetries of vulnerability, and many more factors that add up to the disaster for humanity that would be machine superintelligence. We would have no way to compete—especially if such an AI decided it wanted to compete with us. As the saying goes, there aren't any examples of a lower intelligence doing well when a higher intelligence comes along.
Not everything in this book worked for me. The chapter on consciousness was bizarre and seems to fly in the face of the current state of consciousness science. For instance, Yampolskiy claims that "computers can experience illusions, and so are conscious."
I just don't buy this and favor Sam Harris's perspective on consciousness and illusions as a counterpoint to the claim that computers' ability to experience illusions is evidence of their consciousness. Harris argues that "consciousness is the one thing in this universe that cannot be an illusion." His reasoning is grounded in the idea that the very experience of an illusion presupposes the existence of a conscious observer who is being deceived. In other words, illusions are not evidence of consciousness. Rather, consciousness is a prerequisite for the experience of illusions. While the ability to experience illusions may be a necessary condition for consciousness, it is not a sufficient one.
Perhaps the immediate takeaway from this book is that anyone considering the future of agentic AI should be thinking upfront about controllability. Controllability should be first and foremost in designers' minds and in no way be left as an afterthought, something that can be added in ex-post. This perspective is even more important with the increasing capabilities of agentic AI.
Facts & Figures on AI and Complex Change
22%: Percentage increase in app revenue for OpenAI on the day of the GPT-4o launch—GPT-4o is free, except on mobile. (AppFigures)
$50,000,000: Annual licencing payment expected from OpenAI to News Corp for the next five years for content from the Wall Street Journal, Barron's, Marketwatch, the New York Post, etc. (Wall Street Journal)
$10,000,000: Annual licencing payment expected from OpenAI to Axel Springer for the next three years. (Wall Street Journal)
$5,000,000-$10,000,000: Annual licensing payment expected from OpenAI to the FT for an unknown number of years. (Wall Street Journal)
262%: Percentage increase in NVIDIA's quarterly revenue, as compared to the prior year quarter. (NVIDIA)
40%: Percentage of NVIDIA's data center revenue attributable to inference—aka prompt & response. (SeekingAlpha)
54%: Percentage of male applications with an 'AI in business' course credit who received an interview invitation from a job application in the UK. (Anglia Ruskin University)
28%: Percentage of male applications without an 'AI in business' course credit who received an interview invitation from a job application in the UK. (Anglia Ruskin University)
54%: Percentage of male applications with an 'AI in business' course credit who received an interview invitation from a job application in the UK. (Anglia Ruskin University)
32%: Percentage of male applications without an 'AI in business' course credit who received an interview invitation from a job application in the UK. (Anglia Ruskin University)
24%: Wage premium for jobs that require AI specialist skills in some markets. (PWC)
3.5x: Multiple by which jobs that require specialist AI skills have grown faster than all jobs since 2012. (PWC)
4.8x: Multiple by which labor productivity has grown in AI-exposed sectors versus other sectors. (PWC)
27%: Percentage by which jobs are growing more slowly in AI-exposed occupations. (PWC)
200%: Percentage reduction in false positives during the detection of fraudulant transactions against potentially compromised cards by Mastercard. (Mastercard)
300%: Percentage increase of the speed of identifying merchants at-risk from—or compromised by—fraudsters by Mastercard. (Mastercard)
$14,000,000,000: Worldwide revenue from generative AI in 2020. (Bloomberg via World Economic Forum)
$1,304,000,000,000: Estimated worldwide revenue from generative AI in 2032. (Bloomberg via World Economic Forum)
25%: Percentage of webpages from 2013 to 2023 are no longer acessible. (Pew Research)
38%: Percentage of webpages from 2013 that are no longer accessible. (Pew Research)
And, One More Thing
In my seemingly never-ending quest to quiet the #UnthinkingAIEthusiasts...
Dave Edwards is a Co-Founder of Artificiality. He previously co-founded Intelligentsia.ai (acquired by Atlantic Media) and worked at Apple, CRV, Macromedia, Morgan Stanley, and Quartz.