Pgvector enables vector similarity search operations in PostgreSQL.
Here’s a practical guide to working with vector queries:
Basic vector queries
-- Create a table with vector column
CREATE TABLE items (
id bigserial PRIMARY KEY,
embedding vector(3)
);
-- Search for nearest neighbors using L2 distance
SELECT id, embedding <-> '[3,1,2]' as distance
FROM items
ORDER BY embedding <-> '[3,1,2]'
LIMIT 5;
Common distance operators
<->
– L2 (Euclidean) distance<=>
– Cosine distance<#>
– Inner product<+>
– L1 (Manhattan) distance
Indexing for performance
For better query performance, you can add an HNSW index:
-- Create HNSW index for L2 distance
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
-- Create index for cosine distance
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);
Additional features
- Supports up to 2,000 dimensions per vector
- Allows vector aggregation with
avg()
andsum()
- Enables filtering combined with vector search
- Provides exact and approximate nearest neighbor search
Performance tips
- Add indexes after bulk data loading
- Use appropriate distance metrics for your use case
- Consider increasing
maintenance_work_mem
for faster index building - Monitor index size with
pg_size_pretty(pg_relation_size('index_name'))
Pgvector is particularly useful for applications like semantic search, recommendations, and AI feature storage where vector similarity operations are needed.
Vector operations can be computationally intensive, so proper indexing and query optimization are important for production workloads.
More information can be found on pgvector repo.
Check some of the more advanved tutorials about PostgreSQL: