LLMs have gained a lot of attention recently. They can generate impressive text and seem to have linguistic abilities. But can they really plan and reason like humans? A new research paper by Subbarao Kambhampati and his team investigates this question. They propose a new approach called the “LLM-Modulo Framework” to use LLMs effectively for planning and reasoning tasks.

Link to the paper: https://arxiv.org/abs/2402.01817

The limitations of LLMs in planning and reasoning

The researchers argue that LLMs, despite their impressive performance, cannot independently plan or self-verify. This limitation stems from the underlying nature of their training and operation. LLMs generate the next word in a sequence based on patterns learned from their training data, which is fundamentally different from the systematic reasoning required for planning tasks.

An informal account of viewing an LLM as a giant external non-veridical memory that acts as a pseudo System 1

An informal account of viewing an LLM as a giant external non-veridical memory that acts as a pseudo System 1. Source: https://arxiv.org/pdf/2402.01817

The paper highlights that LLMs operate as a giant pseudo-System 1, relying on pattern matching and fast, intuitive processing. In contrast, planning and reasoning tasks typically require System 2 competencies, which involve slow, deliberate, and logical thinking.

Misunderstandings in the literature

The researchers point out that many existing papers claiming LLMs have planning and reasoning abilities are often based on misunderstandings or oversimplifications. For example, some studies test LLMs in domains that ignore interactions between subgoals, making planning appear easier than it is in reality. Others rely on human intervention through prompting to correct and refine the generated plans.

Experimental evidence

To support their position, the authors conducted experiments evaluating the performance of state-of-the-art LLMs, such as GPT-4, on various planning tasks. The results were sobering: on average, only 12% of the plans generated by the best-performing LLMs were fully correct and executable. Fine-tuning the models did not yield significant improvements in performance. Furthermore, when the names of actions and objects in the planning domain were obfuscated, the performance deteriorated even further. This suggests that LLMs are more likely retrieving plans based on surface-level similarities rather than engaging in true planning.

The researchers also investigated whether LLMs can effectively verify the correctness of plans and improve through self-critique. Again, the results were discouraging. LLMs performed no better at verifying solutions than generating them. Having LLMs critique their own plans did not lead to meaningful improvements in plan quality.

The LLM-Modulo Framework

While the findings underscore the limitations of LLMs in autonomous planning and reasoning, the researchers emphasize that this does not render LLMs useless for these tasks. Instead, they propose the LLM-Modulo Framework as a way to productively leverage the strengths of LLMs.

The proposed LLM-Modulo framework where LLMs act as idea generators and various external critics that specialize in differentaspects, critique the candidate plan.

The proposed LLM-Modulo framework where LLMs act as idea generators and various external critics that specialize in different
aspects, critique the candidate plan. Source: https://arxiv.org/pdf/2402.01817

The core idea behind the LLM-Modulo Framework is to combine the generative capabilities of LLMs with external “critics” or verifiers. In this framework, LLMs are used to generate candidate plans and ideas, which are then scrutinized by a bank of specialized critics. These critics can evaluate the plans based on various criteria, including hard constraints like executability and soft constraints like style and user preferences.

The framework incorporates model-based critics to ensure the soundness and correctness of the plans. These critics rely on formal domain models and planning algorithms to validate the generated plans. Additionally, LLM-based critics can be employed to assess softer aspects like style and coherence.

The critics provide feedback to the LLM, guiding it in refining and improving the generated plans iteratively. This feedback loop allows the LLM to learn from its mistakes and generate higher-quality plans over time.

Multiple roles of LLMs in the framework

The LLM-Modulo Framework allows LLMs to play multiple roles in the planning and reasoning process:

  1. Plan generation: LLMs can generate candidate plans based on the problem specification and previous feedback from the critics.
  2. Format conversion: LLMs excel at converting information between different formats. In the framework, they can translate the generated plans into representations interpretable by various critics.
  3. Problem specification assistance: LLMs can help users flesh out and refine problem specifications by asking clarifying questions and suggesting improvements.
  4. Model acquisition: LLMs can assist in acquiring the domain models used by the model-based critics. They can extract relevant information from text and engage in dialogue with domain experts to refine the models.

Human involvement

While the LLM-Modulo Framework aims to automate much of the planning and reasoning process, human involvement is still crucial in certain aspects. Domain experts play a role in acquiring and refining the domain models used by the model-based critics. End users are involved in refining the problem specifications through interaction with the LLM.

However, the framework aims to minimize the need for humans in the time-consuming task of iterative plan critiquing. By automating the feedback loop between the LLM and the critics, the framework enables efficient and scalable plan generation and refinement.

Case studies and results

The researchers applied the LLM-Modulo Framework to several planning domains to demonstrate its effectiveness. In the Blocks World domain, a classic planning benchmark, the performance of the LLM improved to an impressive 82% within 15 feedback rounds from a model-based verifier. This showcases the framework’s ability to guide the LLM towards generating high-quality plans through iterative refinement.

In a more complex travel planning task, the LLM-Modulo Framework achieved a remarkable 6 times better performance compared to baseline approaches. By leveraging the generative power of the LLM and the rigorous validation of the critics, the framework was able to generate coherent and executable travel plans that satisfied various constraints and preferences.


The research paper by Kambhampati and his team offers valuable insights into the capabilities and limitations of LLMs in planning and reasoning tasks. While LLMs cannot autonomously plan or self-verify, they can still play a productive role when combined with external verifiers in frameworks like LLM-Modulo.

The LLM-Modulo Framework leverages the strengths of LLMs in generating candidate plans and ideas while ensuring the correctness and soundness of the plans through model-based critics. By automating the feedback loop and minimizing the need for human intervention, the framework enables efficient and scalable planning and reasoning.

The case studies demonstrate the potential of the LLM-Modulo Framework to extend the scope of planning to more flexible and expressive problem specifications. By harnessing the generative power of LLMs and the rigor of symbolic planning techniques, the framework offers a promising direction for tackling complex real-world planning and reasoning challenges.

As François Chollet noted in his tweet, “LLMs cannot plan and reason on their own — they’re pattern-matchers. But that doesn’t mean they can’t be useful for reasoning. They work best in conjunction with systems that can actually plan or reason — such as symbolic planners.”

Academic presentation

The authors of the research paper “LLM-Modulo Framework: Leveraging the Power of Large Language Models for Robust Planning and Reasoning” (available at https://arxiv.org/abs/2402.01817) have announced that their work has been accepted for a spotlight presentation at the prestigious International Conference on Machine Learning (ICML) 2024.

A spotlight presentation is a significant recognition of the paper’s importance and impact in the field. It provides the authors with an opportunity to showcase their research to a broad audience of machine learning researchers and practitioners.

The lead author, Subbarao Kambhampati, will be presenting the paper in front of a beribboned poster on July 23 2024 during the conference. This interactive poster session allows attendees to engage with the authors, ask questions, and discuss the details of their work.

In addition to the spotlight presentation, the authors will also be delivering an oral presentation as part of a tutorial session on the day before, July 22. The tutorial, titled “Position: LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks,” will provide a more in-depth exploration of the concepts and techniques presented in the paper. The tutorial details can be found at https://icml.cc/virtual/2024/tutorial/35226.



Categorized in:

Computer Vision, Deep Learning, LLMs, MLOps,

Last Update: 16/06/2024