The Hidden Cost of ChatGPT is the Erosion of the Digital Commons

A new study suggests that the rise of ChatGPT may be eroding the digital commons. If users turn more and more to ChatGPT and other AI models for answers and assistance, rather than posting their questions and solutions publicly, the digital commons that these models rely on will begin to decline.

Are LLMs a Threat to Digital Public Goods?

Key Points:

  • Erosion of the Digital Commons: The rise of AI models like ChatGPT is contributing to the decline of the digital commons, foundational to the modern web, by reducing user contributions to public knowledge platforms.
  • Impact on Stack Overflow: A significant decrease in user activity on Stack Overflow was observed following the release of ChatGPT, with a 16% drop in weekly posts initially, growing to 25% within six months. This decline includes valuable and novel contributions, not just low-quality or duplicate content.
  • Feedback Loop and Proprietary Models: As users increasingly rely on AI models for information, the digital commons suffer, potentially leading to a feedback loop where open platforms diminish and proprietary models dominate, locking valuable data and knowledge in closed silos.
  • Narrowing of Information Seeking: LLMs streamline information seeking, favoring mainstream views and reducing the need for exploration. This predisposes users to a flat, homogenous information landscape, disincentivizing further learning and the use of niche tools.
  • Synthetic Data Limitations: While synthetic data is proposed as a solution, LLM-generated data is ineffective for training AI, exacerbating data scarcity concerns and potentially slowing the generation of new open data.
  • Snake-Eating-Its-Tail Scenario: LLMs depend on human-generated data, their most important input, yet their prevalence reduces the web’s capacity to produce such data, creating a self-perpetuating dilemma.

Is AI destroying the internet? Are we running out of good data? Will AI increasingly eat its own excrement? All these questions are being asked right now and the answer to all of them feels like "yes." But how do we know and what evidence do we have? Perhaps the real question is this: what is happening to the digital commons that underpins so much of the modern web?

At the heart of the issue is the very nature of how ChatGPT and other AI models are trained. These systems consume troves of publicly available data, from Wikipedia articles and Reddit posts to open-source code repositories like GitHub. They then use this data to build their knowledge base and generate outputs in response to user queries. They stand on the shoulders of the digital giants—the countless contributors who have voluntarily shared their knowledge and creativity online for the benefit of all.

A new study suggests that the rise of ChatGPT may be eroding these foundations. Focusing on the popular programming Q&A platform Stack Overflow, the researchers found a significant drop in user activity following the release of ChatGPT. Using sophisticated statistical models, they estimate a 16% decrease in weekly posts, with the effect growing to 25% within six months. Importantly, this decline was not limited to low-quality or duplicate content, which means we should worry because even valuable and novel contributions were being displaced.

