How to Add Memory and Context to Your AI Agent in 2026
TL;DR
- Build an AI agent that remembers conversation history across multiple interactions
- Implement three memory strategies: short-term (conversation buffer), working memory (summarization), and long-term (vector retrieval)
- Use Redis for persistent storage and ChromaDB for semantic memory search
- By the end, you’ll have a customer support agent that recalls previous conversations, user preferences, and contextual details from weeks ago
Prerequisites
Before we start, make sure you have:
Required:
- Python 3.11 or higher
- An OpenAI API key (get one at platform.openai.com)
- Docker Desktop installed (for Redis)
- 2GB free disk space
Knowledge:
- Basic Python and async/await patterns
- Understanding of API calls and JSON
- Familiarity with environment variables
Time: ~45 minutes to complete
Cost: OpenAI API usage (~$0.10-0.50 for testing), Redis and ChromaDB are free locally
What We’re Building
We’re building a customer support AI agent that maintains three types of memory:
- Short-term memory: Recent conversation turns (last 10 messages)
- Working memory: Compressed summaries of longer conversations
- Long-term memory: Semantic search over all past interactions
Architecture flow:
User message → Load conversation history (Redis) → Retrieve relevant past context (ChromaDB) → Build prompt with context → LLM generates response → Save to memory stores → Return response
This pattern is essential for production AI agents—without memory, every interaction starts from zero, frustrating users and wasting context windows with repeated information.
Step 1: Set Up the Project Environment
Create a new project directory and set up a Python virtual environment:
mkdir ai-agent-memory
cd ai-agent-memory
python3.11 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install required dependencies:
pip install langchain==0.1.20 \
langchain-openai==0.1.8 \
langchain-community==0.0.38 \
redis==5.0.4 \
chromadb==0.4.24 \
python-dotenv==1.0.1 \
tiktoken==0.7.0
Create a .env file for your API key:
echo "OPENAI_API_KEY=your_actual_key_here" > .env
Expected output: No errors. Verify with pip list | grep langchain and you should see the installed packages.
Step 2: Start Redis for Conversation Storage
Redis will store our conversation history with fast key-value lookups by user ID.
Start a Redis container:
docker run -d \
--name redis-agent-memory \
-p 6379:6379 \
redis:7.2-alpine \
redis-server --appendonly yes
Verify Redis is running:
docker exec redis-agent-memory redis-cli ping
Expected output: PONG
This gives us a persistent store where conversation history survives application restarts. The --appendonly yes flag enables persistence to disk.
Step 3: Build the Short-Term Memory Manager
Create memory_manager.py to handle conversation buffers:
import json
import redis
from datetime import datetime, timedelta
from typing import List, Dict
import os
class ConversationMemory:
def __init__(self, redis_host="localhost", redis_port=6379, max_messages=10):
self.redis_client = redis.Redis(
host=redis_host,
port=redis_port,
decode_responses=True
)
self.max_messages = max_messages
self.ttl = 60 * 60 * 24 * 30 # 30 days in seconds
def add_message(self, user_id: str, role: str, content: str):
"""Add a message to the conversation history."""
key = f"conversation:{user_id}"
message = {
"role": role,
"content": content,
"timestamp": datetime.now().isoformat()
}
# Add message to the list
self.redis_client.lpush(key, json.dumps(message))
# Trim to max_messages (keep most recent)
self.redis_client.ltrim(key, 0, self.max_messages - 1)
# Set expiration
self.redis_client.expire(key, self.ttl)
def get_conversation(self, user_id: str) -> List[Dict]:
"""Retrieve conversation history for a user."""
key = f"conversation:{user_id}"
messages = self.redis_client.lrange(key, 0, -1)
# Redis returns most recent first, so reverse for chronological order
return [json.loads(msg) for msg in reversed(messages)]
def clear_conversation(self, user_id: str):
"""Clear conversation history for a user."""
key = f"conversation:{user_id}"
self.redis_client.delete(key)
This class handles our short-term memory:
lpushadds messages to the front of a listltrimautomatically removes old messages beyond our limit- Messages expire after 30 days to comply with data retention policies
- We reverse the list when retrieving so the LLM sees chronological order
Step 4: Create the Vector Store for Long-Term Memory
Create vector_memory.py for semantic retrieval:
import chromadb
from chromadb.config import Settings
from langchain_openai import OpenAIEmbeddings
from typing import List, Dict
import hashlib
class VectorMemory:
def __init__(self, collection_name="agent_memory", persist_directory="./chroma_db"):
self.client = chromadb.Client(Settings(
persist_directory=persist_directory,
anonymized_telemetry=False
))
# Get or create collection
self.collection = self.client.get_or_create_collection(
name=collection_name,
metadata={"hnsw:space": "cosine"}
)
self.embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
def add_interaction(self, user_id: str, interaction: str, metadata: Dict = None):
"""Store an interaction for semantic retrieval."""
# Create unique ID from content hash
doc_id = hashlib.md5(f"{user_id}:{interaction}".encode()).hexdigest()
# Generate embedding
embedding = self.embeddings.embed_query(interaction)
# Prepare metadata
meta = metadata or {}
meta["user_id"] = user_id
meta["timestamp"] = meta.get("timestamp", "")
# Store in ChromaDB
self.collection.add(
ids=[doc_id],
embeddings=[embedding],
documents=[interaction],
metadatas=[meta]
)
def retrieve_relevant_context(self, user_id: str, query: str, n_results: int = 3) -> List[Dict]:
"""Retrieve semantically similar past interactions."""
query_embedding = self.embeddings.embed_query(query)
results = self.collection.query(
query_embeddings=[query_embedding],
n_results=n_results,
where={"user_id": user_id}
)
if not results["documents"]:
return []
# Format results
context_items = []
for doc, metadata in zip(results["documents"][0], results["metadatas"][0]):
context_items.append({
"content": doc,
"metadata": metadata
})
return context_items
Key concepts here:
- text-embedding-3-small generates 1536-dimensional vectors for semantic search
- ChromaDB uses HNSW (Hierarchical Navigable Small World) for fast approximate nearest neighbor search
- We filter by
user_idto ensure users only retrieve their own memories - The hash-based ID prevents duplicate storage of identical interactions
Step 5: Implement Working Memory with Summarization
Create working_memory.py for conversation summarization:
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from typing import List, Dict
class WorkingMemory:
def __init__(self, model="gpt-4o-mini"):
self.llm = ChatOpenAI(model=model, temperature=0.3)
self.summarize_prompt = ChatPromptTemplate.from_messages([
("system", "You are an expert at summarizing conversations. Extract key facts, preferences, and important details."),
("user", "Summarize this conversation history, focusing on facts about the user and context that would be helpful for future interactions:\n\n{conversation}")
])
def summarize_conversation(self, messages: List[Dict]) -> str:
"""Create a compressed summary of conversation history."""
if len(messages) < 5:
return "" # Not enough to summarize
# Format messages for summarization
conversation_text = "\n".join([
f"{msg['role'].upper()}: {msg['content']}"
for msg in messages
])
chain = self.summarize_prompt | self.llm
response = chain.invoke({"conversation": conversation_text})
return response.content
def should_summarize(self, messages: List[Dict], threshold: int = 15) -> bool:
"""Determine if conversation should be summarized."""
return len(messages) >= threshold
Working memory compresses long conversations into summaries, preventing context window overflow. We use:
- gpt-4o-mini for cost-effective summarization
- A threshold of 15 messages before triggering summarization
- Low temperature (0.3) for consistent, factual summaries
Step 6: Build the Agent with Integrated Memory
Create agent.py to tie everything together:
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.schema import HumanMessage, AIMessage, SystemMessage
from memory_manager import ConversationMemory
from vector_memory import VectorMemory
from working_memory import WorkingMemory
load_dotenv()
class MemoryAgent:
def __init__(self):
self.conversation_memory = ConversationMemory(max_messages=10)
self.vector_memory = VectorMemory()
self.working_memory = WorkingMemory()
self.llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
self.system_prompt = """You are a helpful customer support agent.
Use the conversation history and relevant context to provide personalized assistance.
If you recall previous interactions or user preferences, mention them naturally.
Be concise but friendly."""
def chat(self, user_id: str, message: str) -> str:
"""Process a user message with full memory integration."""
# 1. Retrieve short-term conversation history
conversation_history = self.conversation_memory.get_conversation(user_id)
# 2. Check if we need to summarize
summary = ""
if self.working_memory.should_summarize(conversation_history):
summary = self.working_memory.summarize_conversation(conversation_history)
# Store summary in vector memory
self.vector_memory.add_interaction(
user_id=user_id,
interaction=summary,
metadata={"type": "summary"}
)
# Clear old history and start fresh with summary
self.conversation_memory.clear_conversation(user_id)
conversation_history = []
# 3. Retrieve relevant long-term context
relevant_context = self.vector_memory.retrieve_relevant_context(
user_id=user_id,
query=message,
n_results=3
)
# 4. Build prompt with all memory types
context_str = ""
if relevant_context:
context_str = "\n\nRelevant past context:\n" + "\n".join([
f"- {item['content']}"
for item in relevant_context
])
if summary:
context_str += f"\n\nRecent conversation summary: {summary}"
# 5. Format conversation history
messages = [SystemMessage(content=self.system_prompt + context_str)]
for msg in conversation_history:
if msg["role"] == "user":
messages.append(HumanMessage(content=msg["content"]))
else:
messages.append(AIMessage(content=msg["content"]))
messages.append(HumanMessage(content=message))
# 6. Generate response
response = self.llm.invoke(messages)
response_text = response.content
# 7. Save to memory stores
self.conversation_memory.add_message(user_id, "user", message)
self.conversation_memory.add_message(user_id, "assistant", response_text)
# Store significant interactions in vector memory
interaction = f"User: {message}\nAgent: {response_text}"
self.vector_memory.add_interaction(user_id, interaction)
return response_text
This agent orchestrates all three memory systems:
- Loads recent conversation from Redis
- Checks if summarization is needed
- Retrieves semantically relevant past interactions
- Builds a context-rich prompt
- Generates a response with full context awareness
- Saves the new interaction to all memory stores
Step 7: Create a Test Interface
Create main.py to interact with your agent:
from agent import MemoryAgent
import sys
def main():
agent = MemoryAgent()
user_id = "test_user_123"
print("AI Agent with Memory - Type 'quit' to exit\n")
print("Try having a conversation, then restart and reference earlier topics!\n")
while True:
try:
user_input = input("You: ").strip()
if user_input.lower() in ["quit", "exit", "q"]:
print("Goodbye!")
break
if not user_input:
continue
response = agent.chat(user_id, user_input)
print(f"\nAgent: {response}\n")
except KeyboardInterrupt:
print("\n\nGoodbye!")
sys.exit(0)
except Exception as e:
print(f"\nError: {e}\n")
if __name__ == "__main__":
main()
Run your agent:
python main.py
Try this conversation flow to test memory:
You: Hi, my name is Alex and I'm interested in your premium plan
Agent: [responds and remembers name]
You: What are the pricing options?
Agent: [provides pricing]
You: I need to think about it
Agent: [acknowledges]
[Exit and restart the program]
You: I'm back, still considering the premium plan
Agent: [Should reference your name Alex and the previous pricing discussion]
Expected behavior: The agent recalls your name, the premium plan interest, and previous pricing discussion across sessions.
Testing Your Implementation
Create test_memory.py to verify all memory types work:
from agent import MemoryAgent
import time
def test_short_term_memory():
"""Test that agent remembers within a conversation."""
agent = MemoryAgent()
user_id = "test_short_term"
# Set a preference
agent.chat(user_id, "I prefer morning appointments")
# Reference it immediately
response = agent.chat(user_id, "What time should we schedule?")
assert "morning" in response.lower(), "Agent didn't recall morning preference"
print("✓ Short-term memory test passed")
def test_long_term_memory():
"""Test that agent retrieves semantically relevant past context."""
agent = MemoryAgent()
user_id = "test_long_term"
# Create some history
agent.chat(user_id, "I have a dog named Max")
agent.chat(user_id, "Thanks")
# Clear short-term but keep vector memory
agent.conversation_memory.clear_conversation(user_id)
# Wait for embedding to process
time.sleep(1)
# Ask related question
response = agent.chat(user_id, "Do you remember my pet?")
assert "max" in response.lower() or "dog" in response.lower(), "Agent didn't retrieve long-term memory"
print("✓ Long-term memory test passed")
def test_conversation_persistence():
"""Test that conversations persist across agent instances."""
user_id = "test_persistence"
# First agent instance
agent1 = MemoryAgent()
agent1.chat(user_id, "My account number is 12345")
# New agent instance (simulates restart)
agent2 = MemoryAgent()
response = agent2.chat(user_id, "What's my account number?")
assert "12345" in response, "Agent didn't persist conversation to Redis"
print("✓ Persistence test passed")
if __name__ == "__main__":
print("Running memory tests...\n")
test_short_term_memory()
test_long_term_memory()
test_conversation_persistence()
print("\nAll tests passed!")
Run the tests:
python test_memory.py
Expected output:
Running memory tests...
✓ Short-term memory test passed
✓ Long-term memory test passed
✓ Persistence test passed
All tests passed!
Common Issues & Fixes
Problem: redis.exceptions.ConnectionError: Error connecting to Redis
Cause: Redis container isn’t running or wrong port
Fix: Check Docker and restart Redis:
docker ps | grep redis
docker start redis-agent-memory
Problem: openai.error.AuthenticationError: Incorrect API key
Cause: Missing or invalid OpenAI API key
Fix: Verify your .env file and reload:
cat .env # Should show OPENAI_API_KEY=sk-...
source venv/bin/activate # Reload environment
Problem: Agent doesn’t recall previous conversations after restart Cause: Using different user IDs or Redis data cleared Fix: Ensure consistent user IDs and check Redis has data:
docker exec redis-agent-memory redis-cli KEYS "conversation:*"
Problem: chromadb.errors.InvalidDimensionException
Cause: Embedding model mismatch between sessions
Fix: Delete the ChromaDB directory and restart:
rm -rf ./chroma_db
python main.py # Will recreate with correct dimensions
Problem: Agent responses are too slow (>5 seconds)
Cause: Vector search with large databases
Fix: Reduce n_results in retrieval and add result caching:
relevant_context = self.vector_memory.retrieve_relevant_context(
user_id=user_id,
query=message,
n_results=2 # Reduced from 3
)
Next Steps
You now have a production-ready memory system for AI agents. Here’s how to extend it:
Add user preference extraction: Parse conversations for explicit preferences (timezone, communication style) and store them as structured metadata:
def extract_preferences(self, message: str) -> Dict:
# Use GPT to extract structured preferences
# Store in Redis hash for fast lookup
Implement memory decay: Weight older memories less in retrieval by adding time-based scoring:
# In vector_memory.py
from datetime import datetime
def retrieve_with_decay(self, query, decay_factor=0.95):
results = self.collection.query(...)
# Apply exponential decay based on timestamp
# More recent = higher score
Add multi-user memory sharing: For team contexts, allow agents to access relevant interactions from other team members while respecting privacy:
def retrieve_team_context(self, team_id: str, query: str):
# Filter by team_id instead of user_id
# Exclude PII-tagged interactions
Monitor memory costs: Track embedding and storage costs by adding telemetry:
import logging
class VectorMemory:
def __init__(self, ...):
self.embeddings_count = 0
def add_interaction(self, ...):
self.embeddings_count += 1
logging.info(f"Total embeddings: {self.embeddings_count}")
Challenge: Implement “memory importance scoring” where the agent decides which interactions are worth storing long-term vs. ephemeral chitchat. Use a classifier to filter out low-value exchanges before adding to vector memory.
For more on AI agent patterns, check out our tutorial on “Building Production-Ready RAG Pipelines” and “Streaming Responses with LangChain Agents”.