How to Build an Autonomous Coding Agent with Function Calling in 2026

Dev Nakamura 18 min read Updated May 24, 2026

TL;DR

  • Build a Python-based autonomous coding agent that can read, analyze, write, and modify code files without constant human prompting
  • Implement OpenAI’s function calling to give your agent tools for file operations, code execution, and self-directed task planning
  • Create a feedback loop where the agent evaluates its own work and iterates until task completion
  • By the end, you’ll have a working agent that can take a high-level request like “create a REST API for user management” and autonomously write the necessary files

Prerequisites

Before starting, ensure you have:

Required:

  • Python 3.11 or higher installed
  • An OpenAI API key with access to GPT-4 or GPT-4 Turbo (get one at platform.openai.com)
  • pip package manager
  • A code editor (VS Code recommended)
  • Basic terminal/command line familiarity

Knowledge:

  • Intermediate Python (functions, classes, error handling)
  • Understanding of API requests and JSON
  • Basic familiarity with async/await patterns
  • Conceptual knowledge of how LLMs work

Time: ~45 minutes

Cost: Approximately $0.10-0.50 per agent run depending on task complexity (GPT-4 Turbo pricing)

What We’re Building

We’re creating an autonomous coding agent that operates in a loop: plan → execute → evaluate → iterate. Unlike a simple chatbot, this agent makes its own decisions about which tools to use and when to stop.

The architecture looks like this:

User Request → Agent (GPT-4) → Function Selection → Tool Execution → Result Analysis → Decision (continue/complete) → Loop or Finish

This approach is worth learning because autonomous agents represent the next evolution beyond single-turn AI interactions. They can handle complex, multi-step tasks that would require dozens of back-and-forth prompts with a standard chatbot.

Step 1: Set Up the Project Environment

Create a new directory for our agent and set up a virtual environment to keep dependencies isolated.

mkdir autonomous-coding-agent
cd autonomous-coding-agent
python -m venv venv

# On macOS/Linux:
source venv/bin/activate

# On Windows:
venv\Scripts\activate

Now install the required dependencies:

pip install openai==1.12.0 python-dotenv==1.0.0

Create a .env file to store your API key securely:

echo "OPENAI_API_KEY=your_api_key_here" > .env

Replace your_api_key_here with your actual OpenAI API key.

Expected output: You should see pip successfully install both packages with no errors. The .env file will be created in your project root.

Step 2: Create the Base Agent Class

Create a new file called agent.py. This will hold our main agent logic.

import os
import json
from typing import List, Dict, Any, Optional
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

class AutonomousCodingAgent:
    def __init__(self, workspace_dir: str = "./workspace"):
        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        self.workspace_dir = workspace_dir
        self.conversation_history: List[Dict[str, Any]] = []
        self.max_iterations = 10
        
        # Create workspace directory if it doesn't exist
        os.makedirs(self.workspace_dir, exist_ok=True)
        
        # System prompt that defines the agent's behavior
        self.system_prompt = """You are an autonomous coding agent. Your job is to complete programming tasks by:
1. Breaking down the request into steps
2. Using available tools to read/write files and execute code
3. Evaluating your own work
4. Iterating until the task is complete

When you believe the task is done, call the complete_task function with a summary.
Be thorough but efficient. Always test your code before marking complete."""
        
        self.conversation_history.append({
            "role": "system",
            "content": self.system_prompt
        })
    
    def add_message(self, role: str, content: str):
        """Add a message to the conversation history."""
        self.conversation_history.append({
            "role": role,
            "content": content
        })

This base class initializes our agent with:

  • An OpenAI client for API calls
  • A workspace directory where the agent can create files
  • Conversation history to maintain context across iterations
  • A system prompt that instructs the agent on its autonomous behavior

The max_iterations safeguard prevents infinite loops if the agent gets stuck.

Step 3: Define Agent Tools (Functions)

Add these tool definitions to agent.py. These are the functions our agent can call:

    def get_available_tools(self) -> List[Dict[str, Any]]:
        """Define all tools the agent can use."""
        return [
            {
                "type": "function",
                "function": {
                    "name": "read_file",
                    "description": "Read the contents of a file in the workspace",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "filename": {
                                "type": "string",
                                "description": "The name of the file to read (relative to workspace)"
                            }
                        },
                        "required": ["filename"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "write_file",
                    "description": "Write content to a file in the workspace. Creates or overwrites the file.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "filename": {
                                "type": "string",
                                "description": "The name of the file to write (relative to workspace)"
                            },
                            "content": {
                                "type": "string",
                                "description": "The content to write to the file"
                            }
                        },
                        "required": ["filename", "content"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "list_files",
                    "description": "List all files in the workspace directory",
                    "parameters": {
                        "type": "object",
                        "properties": {},
                        "required": []
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "execute_python",
                    "description": "Execute Python code in the workspace and return the output. Use this to test code.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "code": {
                                "type": "string",
                                "description": "The Python code to execute"
                            }
                        },
                        "required": ["code"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "complete_task",
                    "description": "Call this when the task is fully complete. Provide a summary of what was accomplished.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "summary": {
                                "type": "string",
                                "description": "A summary of what was accomplished"
                            }
                        },
                        "required": ["summary"]
                    }
                }
            }
        ]

Each tool follows OpenAI’s function calling schema with:

  • A descriptive name
  • A clear description of what it does
  • A JSON Schema defining its parameters
  • Required vs. optional parameters

The complete_task function is critical—it’s how the agent signals it’s done, preventing endless loops.

Step 4: Implement Tool Execution Logic

Add the actual implementations of these tools to agent.py:

import subprocess
import sys

    def execute_tool(self, tool_name: str, arguments: Dict[str, Any]) -> str:
        """Execute a tool and return its result."""
        try:
            if tool_name == "read_file":
                return self._read_file(arguments["filename"])
            
            elif tool_name == "write_file":
                return self._write_file(arguments["filename"], arguments["content"])
            
            elif tool_name == "list_files":
                return self._list_files()
            
            elif tool_name == "execute_python":
                return self._execute_python(arguments["code"])
            
            elif tool_name == "complete_task":
                return f"TASK_COMPLETE: {arguments['summary']}"
            
            else:
                return f"Error: Unknown tool '{tool_name}'"
        
        except Exception as e:
            return f"Error executing {tool_name}: {str(e)}"
    
    def _read_file(self, filename: str) -> str:
        filepath = os.path.join(self.workspace_dir, filename)
        try:
            with open(filepath, 'r') as f:
                content = f.read()
            return f"File '{filename}' contents:\n{content}"
        except FileNotFoundError:
            return f"Error: File '{filename}' not found"
        except Exception as e:
            return f"Error reading file: {str(e)}"
    
    def _write_file(self, filename: str, content: str) -> str:
        filepath = os.path.join(self.workspace_dir, filename)
        try:
            # Create subdirectories if needed
            os.makedirs(os.path.dirname(filepath), exist_ok=True)
            with open(filepath, 'w') as f:
                f.write(content)
            return f"Successfully wrote {len(content)} characters to '{filename}'"
        except Exception as e:
            return f"Error writing file: {str(e)}"
    
    def _list_files(self) -> str:
        try:
            files = []
            for root, dirs, filenames in os.walk(self.workspace_dir):
                for filename in filenames:
                    rel_path = os.path.relpath(
                        os.path.join(root, filename),
                        self.workspace_dir
                    )
                    files.append(rel_path)
            
            if not files:
                return "Workspace is empty"
            return "Files in workspace:\n" + "\n".join(f"- {f}" for f in files)
        except Exception as e:
            return f"Error listing files: {str(e)}"
    
    def _execute_python(self, code: str) -> str:
        """Execute Python code in a subprocess and return output."""
        try:
            # Write code to a temporary file in workspace
            temp_file = os.path.join(self.workspace_dir, "_temp_exec.py")
            with open(temp_file, 'w') as f:
                f.write(code)
            
            # Execute with timeout
            result = subprocess.run(
                [sys.executable, temp_file],
                cwd=self.workspace_dir,
                capture_output=True,
                text=True,
                timeout=10
            )
            
            # Clean up temp file
            os.remove(temp_file)
            
            output = result.stdout if result.stdout else "(no output)"
            if result.stderr:
                output += f"\nErrors:\n{result.stderr}"
            
            return f"Execution completed with return code {result.returncode}:\n{output}"
        
        except subprocess.TimeoutExpired:
            return "Error: Code execution timed out (10s limit)"
        except Exception as e:
            return f"Error executing code: {str(e)}"

Key implementation details:

  • File operations are sandboxed to the workspace directory
  • Python execution runs in a subprocess with a 10-second timeout for safety
  • Error handling at every level returns informative messages to the agent
  • All results are returned as strings that the LLM can understand

The agent learns from these results to make better decisions on subsequent iterations.

Step 5: Build the Main Agent Loop

Add the core autonomous loop logic to agent.py:

    def run(self, user_request: str) -> str:
        """Run the agent autonomously until task completion."""
        print(f"\n🤖 Agent starting task: {user_request}\n")
        
        # Add user request to conversation
        self.add_message("user", user_request)
        
        iteration = 0
        while iteration < self.max_iterations:
            iteration += 1
            print(f"\n--- Iteration {iteration} ---")
            
            # Get agent's next action
            try:
                response = self.client.chat.completions.create(
                    model="gpt-4-turbo-preview",
                    messages=self.conversation_history,
                    tools=self.get_available_tools(),
                    tool_choice="auto"
                )
                
                assistant_message = response.choices[0].message
                
                # Add assistant's response to history
                self.conversation_history.append({
                    "role": "assistant",
                    "content": assistant_message.content,
                    "tool_calls": assistant_message.tool_calls
                })
                
                # Check if agent wants to use tools
                if assistant_message.tool_calls:
                    # Execute each tool call
                    for tool_call in assistant_message.tool_calls:
                        tool_name = tool_call.function.name
                        arguments = json.loads(tool_call.function.arguments)
                        
                        print(f"\n🔧 Calling tool: {tool_name}")
                        print(f"   Arguments: {arguments}")
                        
                        # Execute the tool
                        result = self.execute_tool(tool_name, arguments)
                        print(f"   Result: {result[:200]}{'...' if len(result) > 200 else ''}")
                        
                        # Check for completion
                        if result.startswith("TASK_COMPLETE:"):
                            summary = result.replace("TASK_COMPLETE:", "").strip()
                            print(f"\n✅ Task completed: {summary}")
                            return summary
                        
                        # Add tool result to conversation
                        self.conversation_history.append({
                            "role": "tool",
                            "tool_call_id": tool_call.id,
                            "content": result
                        })
                
                # If no tool calls, agent is just thinking
                elif assistant_message.content:
                    print(f"\n💭 Agent thinking: {assistant_message.content}")
                
            except Exception as e:
                error_msg = f"Error in agent loop: {str(e)}"
                print(f"\n{error_msg}")
                self.add_message("user", f"An error occurred: {error_msg}. Please try a different approach.")
        
        return f"Task incomplete: Reached maximum iterations ({self.max_iterations})"

This loop:

  1. Sends the conversation history to GPT-4 with available tools
  2. Processes any tool calls the model requests
  3. Adds tool results back to the conversation
  4. Continues until complete_task is called or max iterations reached

The agent is truly autonomous—we never prompt it with “what next?” It decides on its own.

Step 6: Create the Main Script

Create a new file called main.py to run the agent:

from agent import AutonomousCodingAgent

def main():
    # Initialize agent with workspace directory
    agent = AutonomousCodingAgent(workspace_dir="./workspace")
    
    # Example task: Create a simple web scraper
    task = """
    Create a Python web scraper that:
    1. Takes a URL as input
    2. Fetches the page content
    3. Extracts all links from the page
    4. Saves the links to a JSON file
    
    Include proper error handling and a main function to test it.
    """
    
    # Run the agent
    result = agent.run(task)
    
    print("\n" + "="*60)
    print("FINAL RESULT:")
    print(result)
    print("="*60)

if __name__ == "__main__":
    main()

Run the agent:

python main.py

Expected output: You’ll see the agent working through iterations:

🤖 Agent starting task: Create a Python web scraper...

--- Iteration 1 ---

🔧 Calling tool: write_file
   Arguments: {'filename': 'scraper.py', 'content': '...'}
   Result: Successfully wrote 1247 characters to 'scraper.py'

--- Iteration 2 ---

🔧 Calling tool: write_file
   Arguments: {'filename': 'requirements.txt', 'content': 'requests==2.31.0\nbeautifulsoup4==4.12.0'}
   Result: Successfully wrote 45 characters to 'requirements.txt'

--- Iteration 3 ---

🔧 Calling tool: execute_python
...

✅ Task completed: Created a web scraper with error handling, link extraction, and JSON export

The agent will create files in the workspace/ directory that you can inspect and run.

Step 7: Add Memory and Context Management

For longer tasks, conversation history grows large. Add token management to agent.py:

    def trim_conversation_history(self, max_messages: int = 20):
        """Keep conversation history manageable by trimming old messages."""
        # Always keep system message
        system_msg = self.conversation_history[0]
        
        # Keep most recent messages
        if len(self.conversation_history) > max_messages:
            self.conversation_history = (
                [system_msg] + 
                self.conversation_history[-(max_messages-1):]
            )
    
    def run(self, user_request: str) -> str:
        """Run the agent autonomously until task completion."""
        print(f"\n🤖 Agent starting task: {user_request}\n")
        
        self.add_message("user", user_request)
        
        iteration = 0
        while iteration < self.max_iterations:
            iteration += 1
            print(f"\n--- Iteration {iteration} ---")
            
            # Trim history to stay within token limits
            self.trim_conversation_history(max_messages=20)
            
            # ... rest of the loop as before

This prevents token limit errors on complex tasks while maintaining enough context for the agent to stay on track.

Step 8: Implement Progress Tracking

Create tracker.py to monitor agent progress:

import json
from datetime import datetime
from typing import List, Dict, Any

class ProgressTracker:
    def __init__(self, log_file: str = "agent_progress.json"):
        self.log_file = log_file
        self.events: List[Dict[str, Any]] = []
    
    def log_event(self, event_type: str, details: Dict[str, Any]):
        """Log an agent event with timestamp."""
        event = {
            "timestamp": datetime.now().isoformat(),
            "type": event_type,
            "details": details
        }
        self.events.append(event)
        
        # Write to file
        with open(self.log_file, 'w') as f:
            json.dump(self.events, f, indent=2)
    
    def get_summary(self) -> Dict[str, Any]:
        """Generate a summary of agent activity."""
        tool_calls = [e for e in self.events if e["type"] == "tool_call"]
        errors = [e for e in self.events if e["type"] == "error"]
        
        return {
            "total_events": len(self.events),
            "tool_calls": len(tool_calls),
            "errors": len(errors),
            "duration_seconds": self._calculate_duration()
        }
    
    def _calculate_duration(self) -> float:
        if len(self.events) < 2:
            return 0.0
        
        start = datetime.fromisoformat(self.events[0]["timestamp"])
        end = datetime.fromisoformat(self.events[-1]["timestamp"])
        return (end - start).total_seconds()

Integrate the tracker into agent.py:

from tracker import ProgressTracker

class AutonomousCodingAgent:
    def __init__(self, workspace_dir: str = "./workspace"):
        # ... existing initialization
        self.tracker = ProgressTracker()
    
    def execute_tool(self, tool_name: str, arguments: Dict[str, Any]) -> str:
        self.tracker.log_event("tool_call", {
            "tool": tool_name,
            "arguments": arguments
        })
        # ... existing tool execution

Now you can review agent_progress.json after each run to analyze the agent’s decision-making process.

Testing Your Implementation

Create test_agent.py to verify functionality:

import os
import shutil
from agent import AutonomousCodingAgent

def test_simple_file_creation():
    """Test that agent can create a simple file."""
    # Clean workspace
    if os.path.exists("./test_workspace"):
        shutil.rmtree("./test_workspace")
    
    agent = AutonomousCodingAgent(workspace_dir="./test_workspace")
    
    task = "Create a file called 'hello.py' that prints 'Hello, World!' when run."
    result = agent.run(task)
    
    # Verify file exists
    assert os.path.exists("./test_workspace/hello.py"), "File was not created"
    
    # Verify content
    with open("./test_workspace/hello.py", 'r') as f:
        content = f.read()
        assert "Hello, World!" in content, "Content doesn't match"
    
    print("✅ Test passed: Simple file creation")

def test_multi_file_project():
    """Test that agent can create a multi-file project."""
    if os.path.exists("./test_workspace"):
        shutil.rmtree("./test_workspace")
    
    agent = AutonomousCodingAgent(workspace_dir="./test_workspace")
    
    task = """
    Create a simple calculator package with:
    - calculator.py with add, subtract, multiply, divide functions
    - test_calculator.py with tests for each function
    - __init__.py to make it a package
    """
    
    result = agent.run(task)
    
    # Verify all files exist
    assert os.path.exists("./test_workspace/calculator.py")
    assert os.path.exists("./test_workspace/test_calculator.py")
    assert os.path.exists("./test_workspace/__init__.py")
    
    print("✅ Test passed: Multi-file project creation")

if __name__ == "__main__":
    test_simple_file_creation()
    test_multi_file_project()
    print("\n🎉 All tests passed!")

Run the tests:

python test_agent.py

Expected output:

✅ Test passed: Simple file creation
✅ Test passed: Multi-file project creation

🎉 All tests passed!

Common Issues & Fixes

Problem: Agent gets stuck in a loop calling the same tool repeatedly

  • Cause: The system prompt doesn’t emphasize evaluation of results
  • Fix: Update the system prompt to explicitly require result evaluation:
self.system_prompt = """...After each tool use, evaluate if it succeeded. If a tool fails twice, try a different approach..."""

Problem: “Rate limit exceeded” error from OpenAI

  • Cause: Too many API calls in rapid succession
  • Fix: Add a retry mechanism with exponential backoff in agent.py:
import time
from openai import RateLimitError

def _call_api_with_retry(self, max_retries=3):
    for attempt in range(max_retries):
        try:
            return self.client.chat.completions.create(...)
        except RateLimitError:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt
                print(f"Rate limited. Waiting {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise

Problem: Agent creates files with syntax errors

  • Cause: No validation step before marking complete
  • Fix: Add a validation tool that runs linters:
{
    "name": "validate_python",
    "description": "Check Python file for syntax errors",
    "parameters": {...}
}

Problem: “File not found” when agent tries to execute code that imports its own files

  • Cause: Python path not set correctly for subprocess
  • Fix: Update _execute_python to set PYTHONPATH:
result = subprocess.run(
    [sys.executable, temp_file],
    cwd=self.workspace_dir,
    env={**os.environ, "PYTHONPATH": self.workspace_dir},
    ...
)

Problem: Agent exceeds token limit on complex tasks

  • Cause: Full conversation history becomes too large
  • Fix: Implement a summarization step:
def summarize_history(self):
    # Every 5 iterations, ask GPT to summarize progress
    summary_response = self.client.chat.completions.create(
        model="gpt-4-turbo-preview",
        messages=[{
            "role": "user",
            "content": f"Summarize this conversation in 3-4 sentences: {self.conversation_history}"
        }]
    )
    # Replace old messages with summary

Next Steps

You now have a working autonomous coding agent. Here are ways to extend it:

Add more tools:

  • Git operations (commit, push, create branches)
  • Package installation (pip install within workspace)
  • Code formatting (black, ruff)
  • Static analysis (mypy, pylint)
  • Web search for documentation lookup

Improve decision-making:

  • Implement a planning phase where the agent outlines steps before executing
  • Add a “reflect” tool where the agent reviews its own code
  • Create a feedback mechanism from test results

Add safety constraints:

  • Implement a token budget tracker to prevent expensive runaway loops
  • Add filesystem permission controls (read-only directories)
  • Create a “human approval” step for certain operations

Challenge: Extend this agent to handle a full feature request—from reading existing code, to writing new functions, to updating tests, to committing changes. This requires orchestrating multiple iterations and maintaining context across a complex workflow.

For more on AI agents, see our related tutorials:

  • “Building RAG Systems for Code Analysis”
  • “LLM Observability: Monitoring Agent Behavior”
  • “Function Calling Best Practices for Production”

The complete code from this tutorial is available in the workspace directory. Experiment with different tasks to see how the agent adapts its strategy.

Share:

Related Posts