In the rapidly evolving field of artificial intelligence, LLMs have become instrumental in advancing natural language processing (NLP) capabilities. However, despite their remarkable language comprehension and generation abilities, LLMs face significant challenges (as we all know) in areas demanding strategic and logical reasoning, particularly in mathematical contexts where precision is paramount.

The research paper “Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B: A Technical Report” introduces an innovative algorithm called MCT Self-Refine (MCTSr) to address these challenges.

The MCT Self-Refine algorithm

MCTSr is an integration of LLMs with a Monte Carlo Tree Search (MCTS) algorithm, focusing on enhancing LLMs’ performance in complex mathematical reasoning tasks, such as those encountered in mathematical Olympiads. The algorithm constructs a Monte Carlo search tree through iterative processes:

1. Selection: The algorithm employs a value function Q to rank all answers that were not fully expanded and selects the highest-valued node for further exploration and refinement using a greedy strategy.

2. Self-Refine: The selected answer undergoes optimization using the Self-Refine framework. The model generates feedback to guide the refinement process and produce an enhanced answer.

3. Self-Evaluation: The refined answer is scored using a self-reward method, and its quality value (Q-value) is computed. The model must adhere to strict scoring standards and avoid providing perfect scores to ensure reliability and fairness in scoring.

4. Backpropagation: The value of the refined answer is propagated backward to its parent node and other related nodes to update the tree’s value information.

Agents can learn decision-making and reasoning from the trial-and-error as humans do.

Figure 1: Agents can learn decision-making and reasoning from the trial-and-error as humans do. Source:

The MCTSr algorithm employs an improved Upper Confidence Bound (UCB) formula to balance exploration and exploitation, optimizing the decision-making process within the MCTS framework. The algorithm iterates through these stages until a termination condition is met, continuously refining the quality of answers and exploring new possibilities.

Experimental results

The researchers tested the MCTSr algorithm on various datasets to assess its effectiveness in solving mathematical problems. They compared the performance of LLaMA3-8B enhanced with MCTSr against state-of-the-art closed-source models like GPT-4, Claude 3, and Gemini 1.5-Pro.

GSM Benchmarks

On the GSM8K and GSM-Hard datasets, MCTSr significantly improved success rates as the number of rollouts increased, especially on the less complex GSM8K dataset. However, the more intricate GSM-Hard set showcased a performance ceiling even at higher rollouts, indicating the limits of current strategies against complex problems.

MATH Benchmark

MCTSr’s performance was evaluated across five difficulty levels on the MATH dataset. The 8-rollouts MCTSr configuration achieved a 90.16% success rate on level-1 problems and a 34.06% success rate on the most challenging level-5 problems. Overall, the 8-rollouts MCTSr solved 58.24% of the 5000 problems across all levels, demonstrating a substantial enhancement from the Zero-Shot CoT’s initial rate of 24.36%.

Olympiad-level benchmarks

MCTSr showed substantial improvements on AIME, GAIC Math Odyssey, and OlympiadBench datasets. The GAIC Math Odyssey results demonstrated the algorithm’s generalization capabilities, as this dataset had minimal overlap with the pre-training corpus of LLaMa3-8B. The algorithm’s performance increased significantly with higher rollouts, highlighting its potential for iterative refinement.

Performance of MCTSr on Olympiad-level Datasets + closed-source LLM performance on mathematical datasets

Performance of MCTSr on Olympiad-level Datasets + closed-source LLM performance on mathematical datasets. Source:

Comparison with closed-source models

MCTSr effectively enhanced the mathematical reasoning capabilities of the small-parameter open-source model LLaMa-3 to a level comparable with state-of-the-art closed-source large models. This finding underscores the potential of integrating MCTS with LLMs to bridge the gap between open-source and closed-source models in complex reasoning tasks.

Future directions and limitations

While the MCTSr algorithm has demonstrated significant potential in mathematical problem-solving, the research is still in its preliminary stages. The potential applications of MCTSr in various scenarios, such as black-box optimization problems and self-driven alignment for LLMs, remain to be explored further. Additionally, the components of MCTSr are highly scalable, necessitating ongoing development to identify and compare a broader range of component algorithms, thereby enhancing the practical potential and effectiveness of the MCTSr algorithm.


The MCT Self-Refine algorithm introduces a novel approach to enhancing the mathematical reasoning capabilities of language models by integrating Monte Carlo Tree Search with iterative answer refinement. The experimental results demonstrate the algorithm’s effectiveness across various benchmarks and difficulty levels, addressing the challenges of accuracy and reliability in complex problem-solving tasks. As research in this field continues to advance, the MCTSr algorithm sets a foundation for future AI integration, paving the way for more accurate and reliable LLM-driven applications in mathematical reasoning and beyond.

Categorized in:

Deep Learning, LLMs,

Last Update: 16/06/2024