AI Agents: How Autonomous AI Systems Actually Work

Prologue: When ChatGPT Just Answers, Agents Actually Do

When you ask ChatGPT to "write a blog post," it writes one. But that's it. It doesn't save the file, create images, or publish to your blog. It just outputs text in a chat window.

Ask Claude Code the same thing? It creates a file, checks your directory structure, sets up image paths, and even makes a git commit. This difference is what separates chatbots from agents.

The first time I experienced this, it blew my mind. I realized LLMs don't just answer—they can actually do things. That made me curious about the architecture. How do agents plan, use tools, and complete tasks autonomously?

The answer came down to one word: loops. Agents don't respond once and stop. They observe → think → act → observe again, repeating until they achieve their goal. Understanding this structure clarified when agents are powerful and when they're overkill.

The Aha Moment: Agents Are Loops, Not Functions

The lightbulb moment for me was this: Agents aren't functions. They're while-loops.

A typical LLM call looks like this:

# Chatbot: one call, one response
response = llm.chat("Write a blog post")
print(response)  # Done

But an agent works like this:

# Agent: repeat until goal achieved
goal = "Write a blog post and save it as a file"
max_iterations = 10

for i in range(max_iterations):
    # 1. Observe: assess current state
    observation = get_current_state()

    # 2. Think: plan next action
    action = llm.plan(goal, observation, history)

    # 3. Act: execute tool
    result = execute_tool(action)

    # 4. Check: goal achieved?
    if is_goal_achieved(goal, result):
        break

This loop structure is the essence of agents. They're not question-answer systems—they're goal-oriented systems that act iteratively.

Think of it like navigation. A chatbot tells you "take Exit 3 and go straight." An agent actually walks, notices a blocked path, and reroutes in real-time.

Deep Dive: Core Components of Agent Architecture

1. Tool Use: How Agents Interact With the World

The defining difference between agents and chatbots is tool usage. LLMs only output text. Agents call functions, hit APIs, read files, and execute code.

This works through function calling. Providers like OpenAI and Anthropic trained models to output structured JSON specifying which function to call with what parameters.

# Tool definitions provided to the agent
tools = [
    {
        "name": "write_file",
        "description": "Write content to a file",
        "parameters": {
            "path": "string",
            "content": "string"
        }
    },
    {
        "name": "search_web",
        "description": "Search the web for information",
        "parameters": {
            "query": "string"
        }
    }
]

# LLM decides to call a tool
response = llm.chat(
    "Write a blog post and save it to post.mdx",
    tools=tools
)

# LLM response
{
    "tool_call": {
        "name": "write_file",
        "arguments": {
            "path": "post.mdx",
            "content": "# My Blog Post\n..."
        }
    }
}

# Agent framework executes the tool
result = execute_tool(response.tool_call)

The key insight: LLMs don't execute tools directly. They just decide which tools to use. The agent framework handles execution. The LLM is the brain. Tools are the hands and feet.

2. Planning: Breaking Down Complex Tasks

When you say "build me a blog," an agent can't handle that in one step. It breaks it down:

Understand blog structure (read directory)
Plan content
Create MDX file
Check image paths
Save file
Verify result

There are two main approaches to planning.

Approach 1: Zero-shot Planning — LLM creates entire plan upfront

plan = llm.chat("""
Goal: Build a blog

Current state: src/content/posts/ directory exists

List the steps as JSON:
""")

# Response
[
    {"step": 1, "action": "read_directory", "args": {"path": "src/content/posts"}},
    {"step": 2, "action": "write_file", "args": {"path": "new-post.mdx"}},
    ...
]

Approach 2: ReAct Pattern — Alternate between Reasoning and Acting

ReAct is my favorite pattern. It makes the agent explain its reasoning at each step.

# ReAct loop
thoughts = []
for step in range(max_steps):
    # Reasoning: what should I do next?
    thought = llm.chat(f"""
    Goal: {goal}
    Previous actions: {thoughts}

    Thought: Analyze the situation and decide next action.
    """)

    thoughts.append(thought)

    # Acting: execute tool
    if thought.contains_action():
        action = parse_action(thought)
        result = execute_tool(action)
        thoughts.append(f"Observation: {result}")

The power of ReAct is explicit reasoning. You can see why the agent made each choice. Incredibly useful for debugging.

A real ReAct log looks like this:

Thought: To write a blog post, I first need to check the format of existing posts.
Action: read_file("src/content/posts/example.mdx")
Observation: Confirmed frontmatter needs title, date, and tags.

Thought: Now I can write a new post with the same format.
Action: write_file("src/content/posts/new.mdx", content)
Observation: File successfully created.

Thought: Goal achieved.

3. Memory: What Agents Remember

Agents use two types of memory.

Short-term Memory (conversation history)

Context for the current task. "What did I just do?"

conversation_history = [
    {"role": "user", "content": "Write a blog post"},
    {"role": "assistant", "content": "What topic?"},
    {"role": "user", "content": "About AI Agents"},
    {"role": "assistant", "content": "Thought: Starting post on AI Agents..."}
]

The problem: this grows unbounded. Even with 128k context windows, long tasks fill up fast. So you summarize or drop old messages.

Long-term Memory (vector database)

Storage for past tasks and knowledge. Uses RAG (Retrieval Augmented Generation).

# Store past work
vector_db.store({
    "content": "Blog post frontmatter format",
    "metadata": {"date": "2026-01-10", "task": "blog_writing"}
})

# Retrieve relevant info
relevant_memories = vector_db.search(
    query="how to write blog posts",
    top_k=3
)

# Provide to agent
context = "\n".join([m.content for m in relevant_memories])
prompt = f"Relevant info:\n{context}\n\nGoal: {goal}"

Long-term memory lets agents reference "how I did this before." Powerful for repeated tasks.

4. Real Examples: Agents in the Wild

Claude Code (what I'm using now)

Automates coding tasks. Say "add tests" and it:

Reads codebase
Understands test structure
Writes tests
Runs them
Fixes failures
Repeats until passing

Devin

Software engineer agent. Give it a GitHub issue and it:

Understands codebase
Finds bug root cause
Writes fix
Runs tests
Creates PR

AutoGPT

General-purpose goal achiever. Say "research competitors and create a report" and it:

Searches web
Collects info
Summarizes
Writes document
Saves

The difference is domain specialization. Claude Code uses coding-specific tools (file read/write, git, terminal). AutoGPT uses general tools (web browsing, file system).

5. Building a Simple Agent

To truly understand agents, build one. Here's a minimal agent I created:

import openai
import json

class SimpleAgent:
    def __init__(self):
        self.tools = {
            "search": self.search_web,
            "calculate": self.calculate,
            "write_file": self.write_file
        }
        self.history = []

    def run(self, goal, max_iterations=5):
        """Agent main loop"""
        print(f"Goal: {goal}\n")

        for i in range(max_iterations):
            print(f"--- Iteration {i+1} ---")

            # 1. Think: decide next action
            response = openai.chat.completions.create(
                model="gpt-4",
                messages=[
                    {"role": "system", "content": self.get_system_prompt()},
                    *self.history,
                    {"role": "user", "content": goal}
                ],
                tools=self.get_tool_definitions()
            )

            message = response.choices[0].message

            # 2. Act: execute tool if called
            if message.tool_calls:
                for tool_call in message.tool_calls:
                    tool_name = tool_call.function.name
                    tool_args = json.loads(tool_call.function.arguments)

                    print(f"Action: {tool_name}({tool_args})")

                    # Execute tool
                    result = self.tools[tool_name](**tool_args)

                    print(f"Result: {result}\n")

                    # Add to history
                    self.history.append({
                        "role": "assistant",
                        "content": None,
                        "tool_calls": [tool_call]
                    })
                    self.history.append({
                        "role": "tool",
                        "content": str(result),
                        "tool_call_id": tool_call.id
                    })
            else:
                # No tool call = task complete
                print(f"Final Answer: {message.content}")
                break

    def get_system_prompt(self):
        return """You are an agent that uses tools to achieve goals.

        Use the available tools step-by-step to achieve the goal.
        When no more tools are needed, provide your final answer."""

    def get_tool_definitions(self):
        return [
            {
                "type": "function",
                "function": {
                    "name": "search",
                    "description": "Search the web for information",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "query": {"type": "string"}
                        },
                        "required": ["query"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "calculate",
                    "description": "Calculate a mathematical expression",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "expression": {"type": "string"}
                        },
                        "required": ["expression"]
                    }
                }
            }
        ]

    def search_web(self, query):
        # In reality, would call Google API
        return f"Search results for '{query}' (mock)"

    def calculate(self, expression):
        return eval(expression)

    def write_file(self, path, content):
        with open(path, 'w') as f:
            f.write(content)
        return f"File saved: {path}"

# Run it
agent = SimpleAgent()
agent.run("Calculate how many times bigger Seoul's population is than Busan's and save to result.txt")

Less than 100 lines, but it contains all the essentials: loop, tool calling, history management. Scale this up and you get sophisticated agents.

6. Limitations: Why Agents Aren't Perfect

Real problems I've hit using agents:

Hallucination

LLMs make things up. They try calling functions that don't exist or use wrong parameters.

# Agent tries to call
{"tool": "delete_all_files", "args": {}}  # This tool doesn't exist!

Fix: clear tool definitions and validation.

Infinite Loops

If agents can't achieve goals, they loop forever. "Read file" → "File not found" → "Read file again" → infinite.

Fix: max_iterations and detect repeated actions.

if action in recent_actions[-3:]:
    print("Repeating same action. Need different strategy.")
    break

Cost

Agents are expensive. One task can trigger dozens of LLM calls. Complex GPT-4 tasks burn through dollars fast.

Fix: use smaller models first (GPT-3.5 for planning, GPT-4 for execution) or cache responses.

When to Use (and When NOT to Use)

Agents aren't always the answer. My rule of thumb:

Use agents when:

Multi-step tasks needed ("collect data → analyze → create report")
Next action depends on results ("retry if error")
External tools/APIs required

Use plain LLM when:

Simple Q&A ("explain this code")
One-shot tasks ("summarize this text")
Cost/speed matters

Using a calculator agent for "what's 1+1?" is overkill. Just ask the LLM. But "analyze our financial statements and create an investment report"? That's agent territory.

Summary: Agents = LLM + Tools + Loop

The most important lesson from studying agents:

Agent = LLM + Tools + Loop

The LLM is the brain. It thinks and decides. But it can't act. Tools are the hands and feet. They write files, call APIs, calculate. The loop is persistence. Not one-and-done, but try-until-success.

Combine these three and you go from "write me something" to "write it, save it, and deploy it."

With ChatGPT, I thought "this is convenient." With agents, I realized "AI can actually work now." That difference will drive massive change in the coming years.

Agents aren't perfect. They hallucinate, loop infinitely, and cost money. But the direction is right. Agents will get smarter, cheaper, and more reliable.

And how we design and control these agents will become the engineering discipline of the AI era. I'm convinced of that.

AI Agents: How Autonomous AI Systems Actually Work

Related Posts

Von Neumann Architecture: The Design Principles of Modern Computers

BERT vs GPT: Two Faces of AI (Understanding vs Generation)

Designing a Payment System: The Weight of Code That Moves Money

Proxy Server: My Personal Errand Runner