Can LLMs reason and plan?

The $1 Trillion Question

The $1 Trillion Question, Mortality in the Age of Generative Ghosts, Your Mind and AI, How to Design AI Tutors for Learning, The Imagining Summit Preview: Adam Cutler, and Helen's Book of the Week.

Your Mind and AI

Magritte's "Son of Man" metaphor explores digital obscurity of reality. AI model collapse mirrors human isolation effects. Minds struggle without diverse input. Balancing AI benefits and genuine human connection crucial. Develop metacognitive skills to navigate this new landscape.

How to Design AI Tutors for Learning

Generative AI promises personalized learning at scale but risks creating dependency. Complementary cognitive artifacts enhance skills; competitive ones replace them. Effective AI tutors balance engagement and autonomy, expanding human cognition without diminishing critical abilities.

Can LLMs reason and plan?

LLMs are great at coming up with approximate knowledge and ideas for potential plans. But to actually use those ideas, you need to pair the LLM with external programs that can rigorously check the plans for errors. The key is to use them as part of a bigger system.

Prompting tips for better reasoning and planning that are based on this research are included at the end of the article.

The AI world feels like it’s divided into two camps: those that think LLMs can reason and plan and those who do not agree. This dichotomy in views gives rise to over-optimism and over-pessimism about AI, neither of which are particularly helpful. So which is it?

It’s increasingly clear that LLMs aren't capable of genuine planning and reasoning. According to ASU researchers, they're essentially giant pseudo-System 1 knowledge sources, not System 2 thinkers. While it’s true that they are more than giant machine translators, it’s also true that they cannot reason autonomously.

One study put LLMs to the test on standard planning problems and found that even the best LLM, GPT-4, could only come up with fully correct plans about 12% of the time. It didn't matter which LLM was used or if was fine-tuned—the results were still pretty dismal. If the researchers made the problem descriptions a bit less obvious by changing some names, the LLMs did even worse. In this study, it looks like LLMs are just pulling up plans that kind of match the problem, not really thinking things through step-by-step like a real planner. They're easily fooled by surface-level stuff.

Earlier research has placed hope on LLMs boosting their accuracy by iteratively critiquing and refining their own solutions. The idea is that checking if a plan works should be easier than coming up with one in the first place. But more recent work pours cold water on this optimism. It turns out that LLMs are just as bad at verifying solutions as they are at generating them. Having the LLM critique its own work can actually make things worse. Even if it stumbles upon a correct solution, it can just pass right over it, not recognizing it is right.

So why do so many papers claim LLMs can plan, when the evidence says they can't? Planning needs two things: 1) domain knowledge about actions and effects, and 2) the ability to put that knowledge together into a plan that actually works, handling any tricky interactions. A lot of the "LLMs can plan!" papers are really just showing that LLMs can spit out general planning knowledge. But that's not the same as an executable plan.

Read the full story

Already have an account? Sign in