Skip to main content

What Is Vector Database: Step-by-Step Guide

A vector database is a specialized database engine that stores high-dimensional embeddings and retrieves them by semantic similarity in milliseconds using approximate nearest neighbor (ANN) search. Unlike traditional relational databases optimized for exact keyword matches and SQL joins, vector databases index embeddings as points in vector space and find the k-nearest neighbors to a query vector using algorithms like HNSW (Hierarchical Navigable Small World) or IVF (Inverted File) instead of B-tree lookups.

In production AI systems, vector databases power retrieval-augmented generation (RAG), recommendation engines, semantic search, and anomaly detection. They solve the core problem: given a query embedding from a user's input or a model's output, find the most contextually relevant stored embeddings and their associated metadata in sub-100 millisecond latency, even at billion-scale.

Why Vector Databases Exist: The Embedding Problem

A traditional relational database cannot efficiently find similar embeddings. If you store a 384-dimensional embedding as a row in PostgreSQL and query for the 10 most similar vectors using Euclidean distance, the database must compute distance to every row, scan the result set, and sort—an O(n) operation that becomes prohibitive at millions of vectors.

Vector databases solve this by pre-indexing embeddings into tree or graph structures that partition vector space. HNSW, for example, builds a hierarchical network where neighbors are spatially close; a search then visits only a small subset of nodes before converging on the nearest neighbors. This reduces search from O(n) to O(log n) or better, making semantic search practical at any scale.

Embeddings: The Input to Vector Databases

An embedding is a fixed-size array of floats—typically 384, 768, or 1536 dimensions—that represents the semantic meaning of text, images, or code. Modern LLMs (GPT-4, Claude, Gemma) and embedding models (OpenAI's text-embedding-ada-002, Sentence Transformers, Voyage AI) encode documents, user queries, or image patches into embeddings such that similar content is close in vector space.

For example:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode([
"Vector databases store embeddings efficiently",
"HNSW is a graph-based indexing algorithm",
"The cat sat on the mat"
])
# embeddings.shape = (3, 384)
# embeddings[0] and embeddings[1] have high cosine similarity (~0.6)
# embeddings[0] and embeddings[2] have low similarity (~0.2)

Vector databases store these embeddings alongside metadata (document ID, URL, creation timestamp, category tags) and allow retrieval by query embedding: "Find the 5 documents most similar to this query."

Core Concepts: ANN, Recall, and Latency

Vector databases balance three metrics:

Recall — the proportion of true k nearest neighbors returned by the ANN algorithm. A vector DB might return 95% recall (95 of the true 100 nearest neighbors) to achieve fast search. Tunable via index parameters.

Latency — time to answer a query. Production RAG systems target <100 ms per query, including embedding generation time.

Throughput — queries per second. A production system handling 1,000 concurrent users might need 500–5,000 QPS.

The trade-off: tuning for low latency (fewer index nodes visited per query) reduces recall; increasing recall increases latency. Vector databases expose knobs to tune this trade-off based on your SLA.

Vector Databases vs. Traditional Databases

A relational database (PostgreSQL with pgvector extension, MySQL 8.0.35+) can store vectors, but search is slow. A document store (Elasticsearch with dense_vector) indexes embeddings but with limited flexibility.

AspectPostgreSQL + pgvectorVector DB (Pinecone, Weaviate)
Embedding search latency500ms–5s (full scan)5–50ms (indexed)
Max practical vectors10M–100M100M–10B+
Metadata filteringSQL WHERE clauseNative filtering + search
Replication / HABuilt-inVaries (Qdrant: built-in, Pinecone: managed)
Managed / Self-hostedSelf-hosted onlyBoth options

A vector database is purpose-built for ANN search and scales to billion-scale at low latency; traditional databases do not.

Core Architecture: How Vector Databases Work

  1. Ingestion: Accept embedding vectors and metadata via API or bulk import. Store in memory and disk with optional replication.
  2. Indexing: Build ANN index structure (HNSW graph, IVF clustering, etc.) on background thread. Enables fast similarity search.
  3. Query: Accept a query vector, traverse ANN structure to find nearest neighbors, apply metadata filters, and return top-k results with distances.
  4. Update / Delete: Modify metadata or remove embeddings. Re-index on demand or via incremental indexing.

Advanced vector databases also support:

  • Metadata filtering: Pre-filter vectors by metadata (e.g., category == "tech" AND date > 2026-01-01) before ANN search.
  • Batch operations: Upsert thousands of vectors and metadata in one call for efficiency.
  • TTL (Time-to-Live): Auto-expire vectors after N days.
  • Namespaces / Collections: Partition vectors logically (per tenant, per dataset).
  • Backup and replication: Persist snapshots and replicate to standby replicas.

Common Use Cases in Production

Retrieval-Augmented Generation (RAG): Embed documents, store in vector DB, retrieve top-k relevant documents as context before prompting an LLM. Solves the problem: "Context is too large for the model's context window." By searching a vector DB, you fetch only the top-5 relevant documents, keeping context small.

Semantic Search: Index all product descriptions or articles. User enters a query; it is embedded, and the vector DB returns the top-10 most similar products or articles in <50ms.

Recommendation Engines: Store user behavior and item embeddings. At inference time, embed the user's current interaction and find similar items from other users with similar behavior.

Anomaly Detection: Cluster embeddings of network traffic, logs, or transactions. Incoming events far from the cluster centroid flag as anomalies.

Image Search: Embed images via a vision model (CLIP, ResNet). User uploads an image; embed it, query the vector DB, get visually similar images.

Key Takeaways

  • A vector database is specialized for high-dimensional embedding search, not general-purpose SQL queries.
  • ANN algorithms (HNSW, IVF) enable sub-100-millisecond search at billion-scale by trading some recall for speed.
  • Vector databases solve the production problem traditional databases cannot: semantic similarity at scale.
  • Metadata filtering during search enables precise control without accuracy loss.
  • Core use cases are RAG, semantic search, recommendations, and anomaly detection in modern AI systems.

Frequently Asked Questions

Can I use PostgreSQL with pgvector instead of a dedicated vector database?

PostgreSQL with pgvector can store and search vectors, but search latency is poor (500ms–5s for millions of vectors because it scans the entire table). Dedicated vector databases use specialized indexing (HNSW, IVF) and achieve 5–50ms latency. Use pgvector for prototypes or <1M vectors; graduate to a dedicated vector DB for production RAG and real-time search.

What is the difference between Pinecone and Weaviate?

Pinecone is a managed vector database (serverless, handled by Pinecone Inc.). You send vectors, it stores and indexes them, you query via API. Weaviate is an open-source vector database you self-host or deploy on Kubernetes. Pinecone excels for fast time-to-value and multi-tenancy; Weaviate excels for privacy-critical deployments and customization.

How do I create embeddings for my documents?

Use an embedding model: OpenAI's text-embedding-ada-002 (1536 dims), Sentence Transformers' all-MiniLM-L6-v2 (384 dims, open-source), or Voyage AI's embedding models. Chunk documents into <512 token segments, embed each chunk, and store the embedding vector plus metadata (chunk ID, source document, timestamp) in the vector DB.

What is the typical embedding dimensionality and does it matter?

Common dimensions are 384 (small models), 768 (medium), and 1536 (GPT-4 / large models). Smaller dimensions (384) reduce storage and search latency; larger dimensions (1536) capture more semantic detail and improve recall. Start with 384 or 768 for production cost-efficiency. Dimensionality also affects storage: 384 dims × 8 bytes per float × 1M vectors = ~3 GB.

Further Reading