How to Build an Autonomous Coding Agent with Function Calling in 2026
TL;DR
- Build a Python-based autonomous coding agent that can read, analyze, write, and modify code files without constant human prompting
- Implement OpenAI’s function calling to give your agent tools for file operations, code execution, and self-directed task planning
- Create a feedback loop where the agent evaluates its own work and iterates until task completion
- By the end, you’ll have a working agent that can take a high-level request like “create a REST API for user management” and autonomously write the necessary files
Prerequisites
Before starting, ensure you have:
Required:
- Python 3.11 or higher installed
- An OpenAI API key with access to GPT-4 or GPT-4 Turbo (get one at platform.openai.com)
- pip package manager
- A code editor (VS Code recommended)
- Basic terminal/command line familiarity
Knowledge:
- Intermediate Python (functions, classes, error handling)
- Understanding of API requests and JSON
- Basic familiarity with async/await patterns
- Conceptual knowledge of how LLMs work
Time: ~45 minutes
Cost: Approximately $0.10-0.50 per agent run depending on task complexity (GPT-4 Turbo pricing)
What We’re Building
We’re creating an autonomous coding agent that operates in a loop: plan → execute → evaluate → iterate. Unlike a simple chatbot, this agent makes its own decisions about which tools to use and when to stop.
The architecture looks like this:
User Request → Agent (GPT-4) → Function Selection → Tool Execution → Result Analysis → Decision (continue/complete) → Loop or Finish
This approach is worth learning because autonomous agents represent the next evolution beyond single-turn AI interactions. They can handle complex, multi-step tasks that would require dozens of back-and-forth prompts with a standard chatbot.
Step 1: Set Up the Project Environment
Create a new directory for our agent and set up a virtual environment to keep dependencies isolated.
mkdir autonomous-coding-agent
cd autonomous-coding-agent
python -m venv venv
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
Now install the required dependencies:
pip install openai==1.12.0 python-dotenv==1.0.0
Create a .env file to store your API key securely:
echo "OPENAI_API_KEY=your_api_key_here" > .env
Replace your_api_key_here with your actual OpenAI API key.
Expected output: You should see pip successfully install both packages with no errors. The .env file will be created in your project root.
Step 2: Create the Base Agent Class
Create a new file called agent.py. This will hold our main agent logic.
import os
import json
from typing import List, Dict, Any, Optional
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
class AutonomousCodingAgent:
def __init__(self, workspace_dir: str = "./workspace"):
self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
self.workspace_dir = workspace_dir
self.conversation_history: List[Dict[str, Any]] = []
self.max_iterations = 10
# Create workspace directory if it doesn't exist
os.makedirs(self.workspace_dir, exist_ok=True)
# System prompt that defines the agent's behavior
self.system_prompt = """You are an autonomous coding agent. Your job is to complete programming tasks by:
1. Breaking down the request into steps
2. Using available tools to read/write files and execute code
3. Evaluating your own work
4. Iterating until the task is complete
When you believe the task is done, call the complete_task function with a summary.
Be thorough but efficient. Always test your code before marking complete."""
self.conversation_history.append({
"role": "system",
"content": self.system_prompt
})
def add_message(self, role: str, content: str):
"""Add a message to the conversation history."""
self.conversation_history.append({
"role": role,
"content": content
})
This base class initializes our agent with:
- An OpenAI client for API calls
- A workspace directory where the agent can create files
- Conversation history to maintain context across iterations
- A system prompt that instructs the agent on its autonomous behavior
The max_iterations safeguard prevents infinite loops if the agent gets stuck.
Step 3: Define Agent Tools (Functions)
Add these tool definitions to agent.py. These are the functions our agent can call:
def get_available_tools(self) -> List[Dict[str, Any]]:
"""Define all tools the agent can use."""
return [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read the contents of a file in the workspace",
"parameters": {
"type": "object",
"properties": {
"filename": {
"type": "string",
"description": "The name of the file to read (relative to workspace)"
}
},
"required": ["filename"]
}
}
},
{
"type": "function",
"function": {
"name": "write_file",
"description": "Write content to a file in the workspace. Creates or overwrites the file.",
"parameters": {
"type": "object",
"properties": {
"filename": {
"type": "string",
"description": "The name of the file to write (relative to workspace)"
},
"content": {
"type": "string",
"description": "The content to write to the file"
}
},
"required": ["filename", "content"]
}
}
},
{
"type": "function",
"function": {
"name": "list_files",
"description": "List all files in the workspace directory",
"parameters": {
"type": "object",
"properties": {},
"required": []
}
}
},
{
"type": "function",
"function": {
"name": "execute_python",
"description": "Execute Python code in the workspace and return the output. Use this to test code.",
"parameters": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "The Python code to execute"
}
},
"required": ["code"]
}
}
},
{
"type": "function",
"function": {
"name": "complete_task",
"description": "Call this when the task is fully complete. Provide a summary of what was accomplished.",
"parameters": {
"type": "object",
"properties": {
"summary": {
"type": "string",
"description": "A summary of what was accomplished"
}
},
"required": ["summary"]
}
}
}
]
Each tool follows OpenAI’s function calling schema with:
- A descriptive name
- A clear description of what it does
- A JSON Schema defining its parameters
- Required vs. optional parameters
The complete_task function is critical—it’s how the agent signals it’s done, preventing endless loops.
Step 4: Implement Tool Execution Logic
Add the actual implementations of these tools to agent.py:
import subprocess
import sys
def execute_tool(self, tool_name: str, arguments: Dict[str, Any]) -> str:
"""Execute a tool and return its result."""
try:
if tool_name == "read_file":
return self._read_file(arguments["filename"])
elif tool_name == "write_file":
return self._write_file(arguments["filename"], arguments["content"])
elif tool_name == "list_files":
return self._list_files()
elif tool_name == "execute_python":
return self._execute_python(arguments["code"])
elif tool_name == "complete_task":
return f"TASK_COMPLETE: {arguments['summary']}"
else:
return f"Error: Unknown tool '{tool_name}'"
except Exception as e:
return f"Error executing {tool_name}: {str(e)}"
def _read_file(self, filename: str) -> str:
filepath = os.path.join(self.workspace_dir, filename)
try:
with open(filepath, 'r') as f:
content = f.read()
return f"File '{filename}' contents:\n{content}"
except FileNotFoundError:
return f"Error: File '{filename}' not found"
except Exception as e:
return f"Error reading file: {str(e)}"
def _write_file(self, filename: str, content: str) -> str:
filepath = os.path.join(self.workspace_dir, filename)
try:
# Create subdirectories if needed
os.makedirs(os.path.dirname(filepath), exist_ok=True)
with open(filepath, 'w') as f:
f.write(content)
return f"Successfully wrote {len(content)} characters to '{filename}'"
except Exception as e:
return f"Error writing file: {str(e)}"
def _list_files(self) -> str:
try:
files = []
for root, dirs, filenames in os.walk(self.workspace_dir):
for filename in filenames:
rel_path = os.path.relpath(
os.path.join(root, filename),
self.workspace_dir
)
files.append(rel_path)
if not files:
return "Workspace is empty"
return "Files in workspace:\n" + "\n".join(f"- {f}" for f in files)
except Exception as e:
return f"Error listing files: {str(e)}"
def _execute_python(self, code: str) -> str:
"""Execute Python code in a subprocess and return output."""
try:
# Write code to a temporary file in workspace
temp_file = os.path.join(self.workspace_dir, "_temp_exec.py")
with open(temp_file, 'w') as f:
f.write(code)
# Execute with timeout
result = subprocess.run(
[sys.executable, temp_file],
cwd=self.workspace_dir,
capture_output=True,
text=True,
timeout=10
)
# Clean up temp file
os.remove(temp_file)
output = result.stdout if result.stdout else "(no output)"
if result.stderr:
output += f"\nErrors:\n{result.stderr}"
return f"Execution completed with return code {result.returncode}:\n{output}"
except subprocess.TimeoutExpired:
return "Error: Code execution timed out (10s limit)"
except Exception as e:
return f"Error executing code: {str(e)}"
Key implementation details:
- File operations are sandboxed to the workspace directory
- Python execution runs in a subprocess with a 10-second timeout for safety
- Error handling at every level returns informative messages to the agent
- All results are returned as strings that the LLM can understand
The agent learns from these results to make better decisions on subsequent iterations.
Step 5: Build the Main Agent Loop
Add the core autonomous loop logic to agent.py:
def run(self, user_request: str) -> str:
"""Run the agent autonomously until task completion."""
print(f"\n🤖 Agent starting task: {user_request}\n")
# Add user request to conversation
self.add_message("user", user_request)
iteration = 0
while iteration < self.max_iterations:
iteration += 1
print(f"\n--- Iteration {iteration} ---")
# Get agent's next action
try:
response = self.client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=self.conversation_history,
tools=self.get_available_tools(),
tool_choice="auto"
)
assistant_message = response.choices[0].message
# Add assistant's response to history
self.conversation_history.append({
"role": "assistant",
"content": assistant_message.content,
"tool_calls": assistant_message.tool_calls
})
# Check if agent wants to use tools
if assistant_message.tool_calls:
# Execute each tool call
for tool_call in assistant_message.tool_calls:
tool_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
print(f"\n🔧 Calling tool: {tool_name}")
print(f" Arguments: {arguments}")
# Execute the tool
result = self.execute_tool(tool_name, arguments)
print(f" Result: {result[:200]}{'...' if len(result) > 200 else ''}")
# Check for completion
if result.startswith("TASK_COMPLETE:"):
summary = result.replace("TASK_COMPLETE:", "").strip()
print(f"\n✅ Task completed: {summary}")
return summary
# Add tool result to conversation
self.conversation_history.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
# If no tool calls, agent is just thinking
elif assistant_message.content:
print(f"\n💭 Agent thinking: {assistant_message.content}")
except Exception as e:
error_msg = f"Error in agent loop: {str(e)}"
print(f"\n❌ {error_msg}")
self.add_message("user", f"An error occurred: {error_msg}. Please try a different approach.")
return f"Task incomplete: Reached maximum iterations ({self.max_iterations})"
This loop:
- Sends the conversation history to GPT-4 with available tools
- Processes any tool calls the model requests
- Adds tool results back to the conversation
- Continues until
complete_taskis called or max iterations reached
The agent is truly autonomous—we never prompt it with “what next?” It decides on its own.
Step 6: Create the Main Script
Create a new file called main.py to run the agent:
from agent import AutonomousCodingAgent
def main():
# Initialize agent with workspace directory
agent = AutonomousCodingAgent(workspace_dir="./workspace")
# Example task: Create a simple web scraper
task = """
Create a Python web scraper that:
1. Takes a URL as input
2. Fetches the page content
3. Extracts all links from the page
4. Saves the links to a JSON file
Include proper error handling and a main function to test it.
"""
# Run the agent
result = agent.run(task)
print("\n" + "="*60)
print("FINAL RESULT:")
print(result)
print("="*60)
if __name__ == "__main__":
main()
Run the agent:
python main.py
Expected output: You’ll see the agent working through iterations:
🤖 Agent starting task: Create a Python web scraper...
--- Iteration 1 ---
🔧 Calling tool: write_file
Arguments: {'filename': 'scraper.py', 'content': '...'}
Result: Successfully wrote 1247 characters to 'scraper.py'
--- Iteration 2 ---
🔧 Calling tool: write_file
Arguments: {'filename': 'requirements.txt', 'content': 'requests==2.31.0\nbeautifulsoup4==4.12.0'}
Result: Successfully wrote 45 characters to 'requirements.txt'
--- Iteration 3 ---
🔧 Calling tool: execute_python
...
✅ Task completed: Created a web scraper with error handling, link extraction, and JSON export
The agent will create files in the workspace/ directory that you can inspect and run.
Step 7: Add Memory and Context Management
For longer tasks, conversation history grows large. Add token management to agent.py:
def trim_conversation_history(self, max_messages: int = 20):
"""Keep conversation history manageable by trimming old messages."""
# Always keep system message
system_msg = self.conversation_history[0]
# Keep most recent messages
if len(self.conversation_history) > max_messages:
self.conversation_history = (
[system_msg] +
self.conversation_history[-(max_messages-1):]
)
def run(self, user_request: str) -> str:
"""Run the agent autonomously until task completion."""
print(f"\n🤖 Agent starting task: {user_request}\n")
self.add_message("user", user_request)
iteration = 0
while iteration < self.max_iterations:
iteration += 1
print(f"\n--- Iteration {iteration} ---")
# Trim history to stay within token limits
self.trim_conversation_history(max_messages=20)
# ... rest of the loop as before
This prevents token limit errors on complex tasks while maintaining enough context for the agent to stay on track.
Step 8: Implement Progress Tracking
Create tracker.py to monitor agent progress:
import json
from datetime import datetime
from typing import List, Dict, Any
class ProgressTracker:
def __init__(self, log_file: str = "agent_progress.json"):
self.log_file = log_file
self.events: List[Dict[str, Any]] = []
def log_event(self, event_type: str, details: Dict[str, Any]):
"""Log an agent event with timestamp."""
event = {
"timestamp": datetime.now().isoformat(),
"type": event_type,
"details": details
}
self.events.append(event)
# Write to file
with open(self.log_file, 'w') as f:
json.dump(self.events, f, indent=2)
def get_summary(self) -> Dict[str, Any]:
"""Generate a summary of agent activity."""
tool_calls = [e for e in self.events if e["type"] == "tool_call"]
errors = [e for e in self.events if e["type"] == "error"]
return {
"total_events": len(self.events),
"tool_calls": len(tool_calls),
"errors": len(errors),
"duration_seconds": self._calculate_duration()
}
def _calculate_duration(self) -> float:
if len(self.events) < 2:
return 0.0
start = datetime.fromisoformat(self.events[0]["timestamp"])
end = datetime.fromisoformat(self.events[-1]["timestamp"])
return (end - start).total_seconds()
Integrate the tracker into agent.py:
from tracker import ProgressTracker
class AutonomousCodingAgent:
def __init__(self, workspace_dir: str = "./workspace"):
# ... existing initialization
self.tracker = ProgressTracker()
def execute_tool(self, tool_name: str, arguments: Dict[str, Any]) -> str:
self.tracker.log_event("tool_call", {
"tool": tool_name,
"arguments": arguments
})
# ... existing tool execution
Now you can review agent_progress.json after each run to analyze the agent’s decision-making process.
Testing Your Implementation
Create test_agent.py to verify functionality:
import os
import shutil
from agent import AutonomousCodingAgent
def test_simple_file_creation():
"""Test that agent can create a simple file."""
# Clean workspace
if os.path.exists("./test_workspace"):
shutil.rmtree("./test_workspace")
agent = AutonomousCodingAgent(workspace_dir="./test_workspace")
task = "Create a file called 'hello.py' that prints 'Hello, World!' when run."
result = agent.run(task)
# Verify file exists
assert os.path.exists("./test_workspace/hello.py"), "File was not created"
# Verify content
with open("./test_workspace/hello.py", 'r') as f:
content = f.read()
assert "Hello, World!" in content, "Content doesn't match"
print("✅ Test passed: Simple file creation")
def test_multi_file_project():
"""Test that agent can create a multi-file project."""
if os.path.exists("./test_workspace"):
shutil.rmtree("./test_workspace")
agent = AutonomousCodingAgent(workspace_dir="./test_workspace")
task = """
Create a simple calculator package with:
- calculator.py with add, subtract, multiply, divide functions
- test_calculator.py with tests for each function
- __init__.py to make it a package
"""
result = agent.run(task)
# Verify all files exist
assert os.path.exists("./test_workspace/calculator.py")
assert os.path.exists("./test_workspace/test_calculator.py")
assert os.path.exists("./test_workspace/__init__.py")
print("✅ Test passed: Multi-file project creation")
if __name__ == "__main__":
test_simple_file_creation()
test_multi_file_project()
print("\n🎉 All tests passed!")
Run the tests:
python test_agent.py
Expected output:
✅ Test passed: Simple file creation
✅ Test passed: Multi-file project creation
🎉 All tests passed!
Common Issues & Fixes
Problem: Agent gets stuck in a loop calling the same tool repeatedly
- Cause: The system prompt doesn’t emphasize evaluation of results
- Fix: Update the system prompt to explicitly require result evaluation:
self.system_prompt = """...After each tool use, evaluate if it succeeded. If a tool fails twice, try a different approach..."""
Problem: “Rate limit exceeded” error from OpenAI
- Cause: Too many API calls in rapid succession
- Fix: Add a retry mechanism with exponential backoff in
agent.py:
import time
from openai import RateLimitError
def _call_api_with_retry(self, max_retries=3):
for attempt in range(max_retries):
try:
return self.client.chat.completions.create(...)
except RateLimitError:
if attempt < max_retries - 1:
wait_time = 2 ** attempt
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
else:
raise
Problem: Agent creates files with syntax errors
- Cause: No validation step before marking complete
- Fix: Add a validation tool that runs linters:
{
"name": "validate_python",
"description": "Check Python file for syntax errors",
"parameters": {...}
}
Problem: “File not found” when agent tries to execute code that imports its own files
- Cause: Python path not set correctly for subprocess
- Fix: Update
_execute_pythonto setPYTHONPATH:
result = subprocess.run(
[sys.executable, temp_file],
cwd=self.workspace_dir,
env={**os.environ, "PYTHONPATH": self.workspace_dir},
...
)
Problem: Agent exceeds token limit on complex tasks
- Cause: Full conversation history becomes too large
- Fix: Implement a summarization step:
def summarize_history(self):
# Every 5 iterations, ask GPT to summarize progress
summary_response = self.client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[{
"role": "user",
"content": f"Summarize this conversation in 3-4 sentences: {self.conversation_history}"
}]
)
# Replace old messages with summary
Next Steps
You now have a working autonomous coding agent. Here are ways to extend it:
Add more tools:
- Git operations (commit, push, create branches)
- Package installation (pip install within workspace)
- Code formatting (black, ruff)
- Static analysis (mypy, pylint)
- Web search for documentation lookup
Improve decision-making:
- Implement a planning phase where the agent outlines steps before executing
- Add a “reflect” tool where the agent reviews its own code
- Create a feedback mechanism from test results
Add safety constraints:
- Implement a token budget tracker to prevent expensive runaway loops
- Add filesystem permission controls (read-only directories)
- Create a “human approval” step for certain operations
Challenge: Extend this agent to handle a full feature request—from reading existing code, to writing new functions, to updating tests, to committing changes. This requires orchestrating multiple iterations and maintaining context across a complex workflow.
For more on AI agents, see our related tutorials:
- “Building RAG Systems for Code Analysis”
- “LLM Observability: Monitoring Agent Behavior”
- “Function Calling Best Practices for Production”
The complete code from this tutorial is available in the workspace directory. Experiment with different tasks to see how the agent adapts its strategy.