Most LLM tutorials teach you how to build a Chain: A -> B -> C. But real-world problems aren't linear. Think of a Chain like a train track — it goes one way. An Agent is more like a Roomba: it moves, bumps into a wall (error), turns around (corrects), and keeps going.
1. Chains vs. Agents
A Chain is hardcoded.
An Agent uses an LLM as a reasoning engine to decide what to do next. It controls the flow.
2. LangGraph Primitives
LangGraph extends LangChain by adding the ability to create cyclic graphs.
The Core Triad
- State: A shared dictionary (like a blackboard) that persists data across steps.
- Nodes: Python functions that perform work (e.g., call LLM, run tool).
- Edges: Control flow rules (e.g., "If tool called, go to ToolNode, else End").
3. Defining the Agent State
The state schema defines what data our graph keeps track of. We use `TypedDict` for type safety.
from typing import TypedDict, Annotated, List, Union
from langgraph.graph import StateGraph, END
from langchain_core.messages import BaseMessage
import operator
class AgentState(TypedDict):
# 'operator.add' ensures new messages are appended, not overwritten
messages: Annotated[List[BaseMessage], operator.add]
user_email: str
4. Coding the Nodes (The Researcher Agent)
Let's build a "Researcher" that can search the web using Tavily.
from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults
# 1. Bind Tools to Model
tools = [TavilySearchResults(max_results=1)]
model = ChatOpenAI(model="gpt-4o")
model_with_tools = model.bind_tools(tools)
# 2. Define the Agent Node
def call_model(state: AgentState):
messages = state['messages']
response = model_with_tools.invoke(messages)
return {"messages": [response]}
5. Conditional Edges (The "Brain")
This function checks if the LLM wants to call a tool or if it's done. This enables the Cycle.
def should_continue(state: AgentState):
last_message = state['messages'][-1]
if last_message.tool_calls:
return "tools" # Go to tool node
return END # Finish execution
6. Human-in-the-Loop Implementation
What if the agent wants to delete a file or send an email? You don't want that happening automatically. LangGraph allows us to add a break point before a specific node.
from langgraph.checkpoint.sqlite import SqliteSaver
# Setup Persistence
memory = SqliteSaver.from_conn_string(":memory:")
# Define Graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_node("action", tool_node)
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", should_continue)
workflow.add_edge("action", "agent")
# Compile with Interrupt
app = workflow.compile(
checkpointer=memory,
interrupt_before=["action"] # STOP before running a tool!
)
Now, when you run this, the graph will pause exactly before executing the tool, giving you a chance to inspect the state and approve (or modify) the action. This is critical for enterprise safety.
Error Handling & Retry Logic
In production, tools fail. APIs time out. LLMs hallucinate tool names. You need robust error handling baked into your agent loop, not bolted on as an afterthought.
from langgraph.prebuilt import ToolNode
from langchain_core.runnables import RunnableConfig
class ResilientToolNode(ToolNode):
"""A tool node that retries on transient failures."""
def invoke(self, state, config: RunnableConfig):
max_retries = 3
for attempt in range(max_retries):
try:
return super().invoke(state, config)
except TimeoutError:
if attempt == max_retries - 1:
# Return error message instead of crashing
return {"messages": [
AIMessage(content=f"Tool timed out after {max_retries} attempts.")
]}
except Exception as e:
return {"messages": [
AIMessage(content=f"Tool error: {str(e)}. Trying alternative approach.")
]}
Deploying with LangServe
Once your agent works locally, you need to ship it. LangServe turns any LangChain/LangGraph runnable into a production API with streaming support, batch processing, and built-in playground.
from fastapi import FastAPI
from langserve import add_routes
app = FastAPI(title="Research Agent API")
# Your compiled graph from above
add_routes(
app,
compiled_graph,
path="/research",
enable_feedback_endpoint=True, # Collect user feedback
playground_type="chat",
)
# Now accessible at:
# POST /research/invoke (single run)
# POST /research/stream (streaming)
# GET /research/playground (interactive UI)
7. Agent Framework Comparison
LangGraph is not the only game in town. Here's how it stacks up against other popular agent frameworks in 2026.
| Feature | LangGraph | AutoGen | CrewAI |
|---|---|---|---|
| Graph Type | Cyclic (loops allowed) | Multi-agent chat | Sequential / Hierarchical |
| State Management | Built-in TypedDict | Shared context | Task-based |
| Human-in-Loop | Native breakpoints | Custom callbacks | Limited |
| Persistence | SQLite / Postgres checkpointer | Custom implementation | File-based |
| Best For | Complex tool-using agents | Multi-agent conversations | Role-playing workflows |
Key Takeaways
- Chains are linear; Agents are graphs. LangGraph's cyclic architecture lets your agent check its own work, retry, and self-correct — something linear chains fundamentally cannot do.
- State is everything. The TypedDict state schema is the "shared blackboard" that makes multi-step reasoning possible. Design it carefully.
- Human-in-the-Loop is non-negotiable. For any agent that can take destructive actions (delete files, send emails, execute code), always add
interrupt_beforebreakpoints. - Error handling > happy path. Production agents need resilient tool nodes that return errors as messages, not crashes.
- LangServe for deployment. Turn your graph into a streaming API with zero extra infrastructure code.