Semantic Search in PostgreSQL Using pgvector
From Raw Text to Intelligent Querying
Introduction.
I was exploring ways to implement memory in an LLM-powered REST API integration. The assistant’s role was to interpret a grid of numbers — mostly analysis and explanation, not computation-heavy tasks. I’ll skip the full architecture and dive into the core topic: memory.
The agent worked fine initially, but we needed a feature where a user could move across different modules while maintaining conversation continuity. For example, after getting insights from Module A, the user might ask follow-up questions in Module B — and the assistant should still remember what happened earlier.
This introduced the challenge of implementing chat memory.
I considered two options:
Using LangChain - LangChain does an incredible job at managing memory, prompts, retrieval, and more. But I paused on it. While it’s great for rapid prototyping, I generally prefer lower-level APIs. Why? Better control, easier performance tuning, more transparency when debugging, and long-term maintainability. If I need to build something quick and dirty, high-level tools are fine. But for scalable systems, I lean toward minimal abstraction.
Building a memory module - That led me into researching vector stores, vector databases, and even graph databases. While these are powerful, they felt like an overkill for my use case especially since this was a proof-of-concept (POC). I needed something lean, simple, and ideally something that integrated with the existing stack.
That’s when I discovered pgvector, a PostgreSQL extension that lets you store and search vector embeddings right inside your database.
When implementing memory in an AI agent, one of the key requirements is to fetch relevant past messages and insert them into the current prompt context. There are different ways to retrieve this history — and several excellent articles on the topic. But no matter which method you choose, the first step is: store it somewhere efficiently.
Traditional database search works great for numeric data, UUIDs, and structured filters. But when it comes to long, unstructured text, standard search becomes both inefficient and expensive.
That’s where vector embeddings and semantic search come in and where pgvector truly shines.
What is pgvector?
pgvector is an open-source PostgreSQL extension that allows you to store, query, and index vector embeddings directly in your database. This brings the power of semantic search to your existing SQL-based workflow — no need for external vector databases like Pinecone or Weaviate.
Vector similarity search is used in applications like:
- Recommendation engines
- Image and video search
- Natural Language Processing (NLP)
- AI assistants with contextual memory
In my case, I used pgvector to fetch contextually similar messages from past conversations based on the user’s current query — like a memory system for chat.
What is a vector?
At a high level a vector is just an array of numbers like a list of coordinates in space. In machine learning and NLP, vectors represent things like sentences, images, or user preferences. Two vectors that are closer together in this space are considered more similar say like.
1
2
"What is inflation?" → [0.12, 0.43, ..., -0.55]
"Explain rising prices" → [0.11, 0.44, ..., -0.56]
Their similarity can be measured using cosine similarity or Euclidean distance.
Vectors and Tensors
A vector is a rank-1 tensor — a linear object in an n-dimensional space. In machine learning, we often generalize data as tensors (multi-dimensional arrays):
- Scalars = rank-0 tensors
- Vectors = rank-1
- Matrices = rank-2
- And so on…
In measure theory, the concept of a tensor product of measures allows us to extend integration and volume concepts to higher-dimensional spaces.
If you have two measure spaces $(\Omega_1, \mathcal{A}_1, \mu_1)$ and $(\Omega_2, \mathcal{A}_2, \mu_2)$, then their product measure $\mu_1 \otimes \mu_2$ allows us to define a measure on the Cartesian product $\Omega_1 \times \Omega_2$.
This is analogous to how we handle multi-modal or multi-feature data in ML for instance, embeddings that combine text and image vectors.
But back to our case — you don’t need to dive that deep to understand pgvector. Just know this:
A sentence can be turned into a vector, and vectors help us find similar sentences based on meaning and not exact words.
How Vectors Work
Embeddings and vectors are are somethiems used interchanglebly but they are not the same strictly speaking.
An embedding is a numerical representaion of data that captures its relevant qualities in a way that ML algorithms can process.The data is embeded in an n-dimenesional space. The data does not necessarily need to be stored as a vector however vector has come out to be the main stream form.
word embeddings are mathematical models that encode word relations within a vector space.They are created by an unsupervised training process based on co-occurrences information between words in a large corpus.The encoded relations include semantics and syntactic properties of words (Camacho et.al,2018).
In essence, embeddings turn unstructured data like text into structured numerical formats that machines can efficiently work with. With that understanding, let’s now explore how this theory comes to life through pgvector
Setting up pgvector
in PostgreSQL
I’ll skip the installation steps — the pgvector
GitHub repo has a well-written README to get you started.
Once the extension is installed on your PostgreSQL instance, you’ll also need to install the pgvector Python extension to enable its use in Django. Again, the documentation covers this setup clearly, so I won’t belabor it here.
Defining the Models
The important bit is how we define our models to enable performance testing:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class BaseConversationMemory(models.Model):
Id = models.UUIDField(default=uuid.uuid4, primary_key=True, editable=False)
ConversationID = models.UUIDField(default=uuid.uuid4, editable=False, db_index=True)
Content = models.TextField()
DateCreated = models.DateTimeField(auto_now_add=True)
class Meta:
abstract = True
class TraditionalConversationMemory(BaseConversationMemory):
class Meta:
db_table = 'TraditionalConversationMemories'
class VectorConversationMemory(BaseConversationMemory):
Embedding = VectorField(dimensions=1536)
class Meta:
db_table = 'VectorConversationMemories'
indexes = [
IvfflatIndex(
name='embeddingIndex',
fields=['Embedding'],
lists=100,
opclasses=['vector_cosine_ops']
)
]
This structure allows us to compare two approaches:
- In-memory similarity using TF-IDF + cosine similarity (Python-based)
- Native vector search using PostgreSQL +
pgvector
The goal is to retrieve relevant past messages from a user’s ongoing conversation and use those to build better prompts for the LLM.
TF-IDF Similarity Search
Here’s the in-memory TF-IDF-based approach:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def _handle_existing_conversation(self, question, memories, conversation_id):
memory_texts = [m.Content for m in memories]
documents = memory_texts + [question]
tfidf_matrix = TfidfVectorizer(stop_words='english').fit_transform(documents)
similarity_scores = cosine_similarity(tfidf_matrix[-1], tfidf_matrix[:-1]).flatten()
top_matches = sorted(
zip(memories, similarity_scores),
key=lambda x: x[1],
reverse=True
)[:5]
context_data = "\n".join([f"{mem.Content}" for mem, _ in top_matches])
prompt = self._prepare_prompt(question, context_data)
response_text = OpenAIAgent().chatCompletionRequest(prompt)
Native Vector Search with pgvector
Here’s the method that uses native vector search:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def _handle_existing_conversation(self, conversation_id, question):
openai_agent = OpenAIAgent()
question_embedding = openai_agent.getEmbeddings(question)
memories = (
VectorConversationMemory.objects
.filter(ConversationID=conversation_id)
.annotate(distance=CosineDistance(F("Embedding"), question_embedding))
.order_by("distance")[:5]
)
if not memories.exists():
return self._handle_new_conversation(conversation_id, "", question)
context_data = "\n".join([m.Content for m in memories])
prompt = self._prepare_prompt(question, context_data)
response_text = openai_agent.chatCompletionRequest(prompt)
Results and Performance
We simulated the performance using six different questions.
The resulting graph (shown below) demonstrates that the vector search consistently returns results in under 12 seconds — clearly showcasing the advantages of leveraging database-level vector indexing over pure Python in-memory similarity.
Conclusions.
Using pgvector significantly improved performance compared to performing similarity search purely in memory. By offloading vector similarity computation to the database, we reduced latency and achieved more consistent response times — especially as the number of stored conversations grows.