I’ve released a collection of models designed to improve the retrieval stage in RAG systems through query expansion. The collection addresses a common challenge in search and retrieval – bridging the gap between user queries and document content.

When implementing RAG systems, the retrieval phase often struggles with vocabulary mismatch between queries and relevant documents. Query expansion helps solve this by generating semantically related search terms. For example, expanding “apple stock” to include terms like “AAPL share price” and “apple market value” increases the chance of finding pertinent information.

Example schema of advanced RAG system:

Query expansion model schema

The collection includes both fine-tuned language models and their GGUF quantized versions based on Qwen2.5 and Llama-3.2 architectures. The GGUF variants come in multiple quantization formats (F16 to Q3_K_M) to balance performance and resource requirements. This makes them suitable for deployment in production environments where latency and resource efficiency matter.

The models were trained on a specialized dataset of query-expansion pairs, created through a combination of large language model generation and manual curation. The dataset is also available as part of the collection for those interested in training their own models.

The complete collection is available on Hugging Face: https://huggingface.co/collections/s-emanuilov/query-expansion-678f2742c37d702adfe445e8

It includes:

All code and models are open source, ready to be integrated into existing search and RAG pipelines.

Categorized in:

Short answers,

Last Update: 21/01/2025