Can LLMs reason and plan?

LLMs are great at coming up with approximate knowledge and ideas for potential plans. But to actually use those ideas, you need to pair the LLM with external programs that can rigorously check the plans for errors. The key is to use them as part of a bigger system.

An abstract image of a gantt chart to illustrate reasoning and plannin

Prompting tips for better reasoning and planning that are based on this research are included at the end of the article.

The AI world feels like it’s divided into two camps: those that think LLMs can reason and plan and those who do not agree. This dichotomy in views gives rise to over-optimism and over-pessimism about AI, neither of which are particularly helpful. So which is it?

It’s increasingly clear that LLMs aren't capable of genuine planning and reasoning. According to ASU researchers, they're essentially giant pseudo-System 1 knowledge sources, not System 2 thinkers. While it’s true that they are more than giant machine translators, it’s also true that they cannot reason autonomously.

One study put LLMs to the test on standard planning problems and found that even the best LLM, GPT-4, could only come up with fully correct plans about 12% of the time. It didn't matter which LLM was used or if was fine-tuned—the results were still pretty dismal. If the researchers made the problem descriptions a bit less obvious by changing some names, the LLMs did even worse. In this study, it looks like LLMs are just pulling up plans that kind of match the problem, not really thinking things through step-by-step like a real planner. They're easily fooled by surface-level stuff.

Earlier research has placed hope on LLMs boosting their accuracy by iteratively critiquing and refining their own solutions. The idea is that checking if a plan works should be easier than coming up with one in the first place. But more recent work pours cold water on this optimism. It turns out that LLMs are just as bad at verifying solutions as they are at generating them. Having the LLM critique its own work can actually make things worse. Even if it stumbles upon a correct solution, it can just pass right over it, not recognizing it is right.

So why do so many papers claim LLMs can plan, when the evidence says they can't? Planning needs two things: 1) domain knowledge about actions and effects, and 2) the ability to put that knowledge together into a plan that actually works, handling any tricky interactions. A lot of the "LLMs can plan!" papers are really just showing that LLMs can spit out general planning knowledge. But that's not the same as an executable plan.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Artificiality.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.