Skip to content

Signal Through the Noise

Honest takes on code, AI, and what actually works

Menu
  • Home
  • My Story
  • Experience
  • Contacts
Menu

PgVector for AI Memory in Production Applications

Posted on November 16, 2025December 1, 2025 by ivan.turkovic

Introduction

As AI moves from experimentation into real products, one challenge appears over and over again: memory. Large language models (LLMs) are incredibly capable, but they can’t store long-term knowledge about users or applications out-of-the-box. They respond only to what they see in the prompt and once the prompt ends, the memory disappears.

This is where vector databases and especially PgVector step in.

PgVector is a PostgreSQL extension that adds first-class vector similarity search to a database you probably already use. With its rise in popularity especially in production AI systems it has become one of the simplest and most powerful ways to build AI memory.

This post is a deep dive into PgVector, how it works, why it matters, and how to implement it properly for real LLM-powered features.


What Is PgVector?

PgVector is an open-source PostgreSQL extension that adds support for storing and querying vector data types. These vectors represent high‑dimensional numerical representations embeddings generated from AI models.

Examples:

  • A sentence embedding from OpenAI might be a vector of 1,536 floating‑point numbers.
  • An image embedding from CLIP might be 512 or 768 numbers.
  • A user profile embedding might be custom‑generated from your own model.

PgVector lets you:

  • Store these vectors
  • Index them efficiently
  • Query them using similarity search (cosine, inner product, Euclidean)

This enables your LLM applications to:

  • Retrieve knowledge
  • Add persistent memory
  • Reduce hallucinations
  • Add personalization or context
  • Build recommendation engines

And all of that without adding a new complex piece of infrastructure because it works inside PostgreSQL.


How PgVector Works

At its core, PgVector introduces a new column type:

vector(1536)

You decide the dimension based on your embedding model. PgVector then stores the vector and allows efficient search using:

  • Cosine distance (1 – cosine similarity)
  • Inner product
  • Euclidean (L2)

Similarity Search

Similarity search means: given an embedding vector, find the stored vectors that are closest to it.

This is crucial for LLM memory.

Instead of asking the model to “remember” everything or hallucinating answers, we retrieve the most relevant facts, messages, documents, or prior interactions before the LLM generates a response.

Indexing

PgVector supports two main index types:

  • IVFFlat (fast, approximate search – great for production)
  • HNSW (hierarchical – even faster for large datasets)

Example index creation:

CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Using PgVector With Embeddings

Step 1: Generate Embeddings

You generate embeddings from any model:

  • OpenAI Embeddings
  • Azure
  • HuggingFace models
  • Cohere
  • Llama.cpp
  • Custom fine‑tuned transformers

Example (OpenAI):

POST https://api.openai.com/v1/embeddingsCode language: JavaScript (javascript)
{"model": "text-embedding-3-large","input": "Hello world"}Code language: JSON / JSON with Comments (json)

This returns a vector like:

[0.0213, -0.0045, 0.9983, ...]Code language: JSON / JSON with Comments (json)

Step 2: Store Embeddings in PostgreSQL

A table for memory might look like:

CREATE TABLE memory (id SERIAL PRIMARY KEY, content TEXT NOT NULL, embedding vector(1536), metadata JSONB, created_at TIMESTAMP DEFAULT NOW());Code language: PHP (php)

Insert data:

INSERT INTO memory (content, embedding) VALUES ('User likes Japanese and Mexican cuisine', '[0.234, -0.998, ...]');Code language: JavaScript (javascript)

Step 3: Query Similar Records

SELECT content, (embedding <=> '[0.23, -0.99, ...]') AS distance FROM memory ORDER BY embedding <=> '[0.23, -0.99, ...]' LIMIT 5;Code language: PHP (php)

This returns the top 5 most relevant memory snippets and those will be added to the prompt context.


Storing Values for AI Memory

What You Store Depends on Your Application

You can store:

  • Chat history messages
  • User preferences
  • Past actions
  • Product details
  • Documents
  • Errors and solutions
  • Knowledge base articles
  • User profiles

Recommended Structure

A flexible structure:

{
  "type": "preference",
  "user_id": 42,
  "source": "chat",
  "topic": "food",
  "tags": ["japanese", "mexican"]
}Code language: JSON / JSON with Comments (json)

This gives you the ability to:

  • Filter search by metadata
  • Separate memories per user
  • Restrict context retrieval by type

Temporal Decay (Optional)

You can implement ranking adjustments:

  • Recent memories score higher
  • Irrelevant memories score lower
  • Outdated memories auto‑expire

This creates human‑like memory behavior.


Reducing Hallucinations With PgVector

LLMs hallucinate when they lack context.

Most hallucinations are caused by missing information, not by model failure.

PgVector solves this by ensuring the model always receives:

  • The top relevant facts
  • Accurate summaries
  • Verified data

Retrieval-Augmented Generation (RAG)

You transform a prompt from:

Without RAG:

“Tell me about Ivan’s garden in Canada.”

With RAG:

“Tell me about Ivan’s garden in Canada. Here are relevant facts from memory: The garden is 20m². – Located in Canada. – Used for planting vegetables.”

The model no longer needs to guess.

Why This Reduces Hallucination

Because the model:

  • Is not guessing user data
  • Only completes based on retrieved facts
  • Gets guardrails through data-driven knowledge
  • Becomes deterministic

PgVector acts like a mental database for the AI.


Adding PgVector to a Production App

Here’s the blueprint.

1. Install the extension

CREATE EXTENSION IF NOT EXISTS vector;

2. Create your memory table

Use the structure that fits your domain.

3. Create an index

CREATE INDEX memory_embedding_idx
ON memory USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

4. Create a Memory Service

Your backend service should:

  • Accept content
  • Generate embeddings
  • Store them with metadata

And another service should:

  • Take an embedding
  • Query top-N matches
  • Return the context

5. Use RAG in your LLM pipeline

Every LLM call becomes:

  1. Embed the question
  2. Retrieve relevant memory
  3. Construct prompt
  4. Call the LLM
  5. Store new memories (if needed)

6. Add Guardrails

Production memory systems need:

  • Permission control (per user)
  • Expiration rules
  • Filters (e.g., exclude private data)
  • Maximum memory size

7. Add Analytics

Track:

  • Hit rate (how often memory is used)
  • Relevance quality
  • Retrieval time

Common Pitfalls and How to Avoid Them

❌ Storing whole conversation transcripts

This leads to massive token usage. Instead, store summaries.

❌ Retrieving too many memories

Keep context small. 3–10 items is ideal.

❌ Wrong distance metric

Most embedding models work best with cosine similarity.

❌ Using RAG without metadata filters

You don’t want another user’s memory leaking into the context.

❌ No indexing

Without IVFFlat/HNSW, retrieval becomes extremely slow.


When Should You Use PgVector?

Use it if you:

  • Already use PostgreSQL
  • Want simple deployment
  • Want memory that scales to millions of rows
  • Need reliability and ACID guarantees
  • Want to avoid new infrastructure like Pinecone, Weaviate, or Milvus

Do NOT use it if you:

  • Need billion‑scale vector search
  • Require ultra‑low latency for real‑time gaming or streaming
  • Need dynamic sharding across many nodes

But for 95% of AI apps, PgVector is perfect.


Conclusion

PgVector is the bridge between normal production data and the emerging world of AI memory. For developers building real applications chatbots, agents, assistants, search engines, personalization engines it offers the most convenient and stable foundation.

You get:

  • Easy deployment
  • Reliable storage
  • Fast similarity search
  • A complete memory layer for AI

This turns your LLM features from fragile experiments into solid, predictable production systems.

If you’re building AI products in 2025, PgVector isn’t “nice to have” it’s a core architectural component.

  • Instagram
  • Facebook
  • GitHub
  • LinkedIn

Recent Posts

  • What I Wrote About in 2025
  • A Christmas Eve Technology Outlook: Ruby on Rails and Web Development in 2026
  • The Future of Language Frameworks in an AI-Driven Development Era
  • From Intentions to Impact: Your 2025 Strategy Guide (Part 2)
  • Stop Procrastinating in 2025: Part 1 – Building Your Foundation Before New Year’s Resolutions

Recent Comments

  • What I Wrote About in 2025 - Ivan Turkovic on From Intentions to Impact: Your 2025 Strategy Guide (Part 2)
  • From Intentions to Impact: Your 2025 Strategy Guide (Part 2) - Ivan Turkovic on Stop Procrastinating in 2025: Part 1 – Building Your Foundation Before New Year’s Resolutions
  • שמוליק מרואני on Extending Javascript objects with a help of AngularJS extend method
  • thorsson on AngularJS directive multiple element
  • akash sinha on AngularJS directive multiple element

Archives

  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • May 2025
  • April 2025
  • March 2025
  • January 2021
  • April 2015
  • November 2014
  • October 2014
  • June 2014
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • April 2012
  • October 2011
  • September 2011
  • June 2011
  • December 2010

Categories

  • AI
  • AngularJS
  • blockchain
  • development
  • ebook
  • Introduction
  • mac os
  • personal
  • personal development
  • presentation
  • productivity
  • ruby
  • ruby on rails
  • sinatra
  • start
  • startup
  • success
  • Uncategorized

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
© 2026 Signal Through the Noise | Powered by Superbs Personal Blog theme