Building Your First AI Agent with LangGraph

Most LLM tutorials teach you how to build a Chain: A -> B -> C. But real-world problems aren't linear. Think of a Chain like a train track — it goes one way. An Agent is more like a Roomba: it moves, bumps into a wall (error), turns around (corrects), and keeps going.

3.5x Higher Success Rate

60% Code Reduction

Zero Infinite Loops

1. Chains vs. Agents

A Chain is hardcoded.
An Agent uses an LLM as a reasoning engine to decide what to do next. It controls the flow.

2. LangGraph Primitives

LangGraph extends LangChain by adding the ability to create cyclic graphs.

graph TD Start --> Agent Agent --> Decision{Tools?} Decision -- Yes --> ToolNode ToolNode --> Agent Decision -- No --> End style Agent fill:#f9f,stroke:#333,stroke-width:2px style ToolNode fill:#bbf,stroke:#333,stroke-width:2px

                        The Core Triad
                        State: A shared dictionary (like a blackboard) that persists data across steps.
Nodes: Python functions that perform work (e.g., call LLM, run tool).
Edges: Control flow rules (e.g., "If tool called, go to ToolNode, else End").

                    

3. Defining the Agent State

The state schema defines what data our graph keeps track of. We use `TypedDict` for type safety.

from typing import TypedDict, Annotated, List, Union
from langgraph.graph import StateGraph, END
from langchain_core.messages import BaseMessage
import operator

class AgentState(TypedDict):
    # 'operator.add' ensures new messages are appended, not overwritten
    messages: Annotated[List[BaseMessage], operator.add]
    user_email: str

4. Coding the Nodes (The Researcher Agent)

Let's build a "Researcher" that can search the web using Tavily.

from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults

# 1. Bind Tools to Model
tools = [TavilySearchResults(max_results=1)]
model = ChatOpenAI(model="gpt-4o")
model_with_tools = model.bind_tools(tools)

# 2. Define the Agent Node
def call_model(state: AgentState):
    messages = state['messages']
    response = model_with_tools.invoke(messages)
    return {"messages": [response]}

5. Conditional Edges (The "Brain")

This function checks if the LLM wants to call a tool or if it's done. This enables the Cycle.

def should_continue(state: AgentState):
    last_message = state['messages'][-1]
    if last_message.tool_calls:
        return "tools"  # Go to tool node
    return END          # Finish execution

6. Human-in-the-Loop Implementation

What if the agent wants to delete a file or send an email? You don't want that happening automatically. LangGraph allows us to add a break point before a specific node.

from langgraph.checkpoint.sqlite import SqliteSaver

# Setup Persistence
memory = SqliteSaver.from_conn_string(":memory:")

# Define Graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_node("action", tool_node)

workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", should_continue)
workflow.add_edge("action", "agent")

# Compile with Interrupt
app = workflow.compile(
    checkpointer=memory, 
    interrupt_before=["action"]  # STOP before running a tool!
)

Now, when you run this, the graph will pause exactly before executing the tool, giving you a chance to inspect the state and approve (or modify) the action. This is critical for enterprise safety.

Error Handling & Retry Logic

In production, tools fail. APIs time out. LLMs hallucinate tool names. You need robust error handling baked into your agent loop, not bolted on as an afterthought.

from langgraph.prebuilt import ToolNode
from langchain_core.runnables import RunnableConfig

class ResilientToolNode(ToolNode):
    """A tool node that retries on transient failures."""
    
    def invoke(self, state, config: RunnableConfig):
        max_retries = 3
        for attempt in range(max_retries):
            try:
                return super().invoke(state, config)
            except TimeoutError:
                if attempt == max_retries - 1:
                    # Return error message instead of crashing
                    return {"messages": [
                        AIMessage(content=f"Tool timed out after {max_retries} attempts.")
                    ]}
            except Exception as e:
                return {"messages": [
                    AIMessage(content=f"Tool error: {str(e)}. Trying alternative approach.")
                ]}

Pro Tip: Never let a tool error crash your agent. Return the error as a message back to the LLM — it can often self-correct by choosing a different tool or rephrasing the query.

Deploying with LangServe

Once your agent works locally, you need to ship it. LangServe turns any LangChain/LangGraph runnable into a production API with streaming support, batch processing, and built-in playground.

from fastapi import FastAPI
from langserve import add_routes

app = FastAPI(title="Research Agent API")

# Your compiled graph from above
add_routes(
    app,
    compiled_graph,
    path="/research",
    enable_feedback_endpoint=True,  # Collect user feedback
    playground_type="chat",
)

# Now accessible at:
# POST /research/invoke   (single run)
# POST /research/stream    (streaming)
# GET  /research/playground (interactive UI)

7. Agent Framework Comparison

LangGraph is not the only game in town. Here's how it stacks up against other popular agent frameworks in 2026.

Feature	LangGraph	AutoGen	CrewAI
Graph Type	Cyclic (loops allowed)	Multi-agent chat	Sequential / Hierarchical
State Management	Built-in TypedDict	Shared context	Task-based
Human-in-Loop	Native breakpoints	Custom callbacks	Limited
Persistence	SQLite / Postgres checkpointer	Custom implementation	File-based
Best For	Complex tool-using agents	Multi-agent conversations	Role-playing workflows

                        Key Takeaways
                        Chains are linear; Agents are graphs. LangGraph's cyclic architecture lets your agent check its own work, retry, and self-correct — something linear chains fundamentally cannot do.
State is everything. The TypedDict state schema is the "shared blackboard" that makes multi-step reasoning possible. Design it carefully.
Human-in-the-Loop is non-negotiable. For any agent that can take destructive actions (delete files, send emails, execute code), always add interrupt_before breakpoints.
Error handling > happy path. Production agents need resilient tool nodes that return errors as messages, not crashes.
LangServe for deployment. Turn your graph into a streaming API with zero extra infrastructure code.

                    

Back to Portfolio

Contents