Agent Systems
The session frames agentic AI as an evolution of basic RAG for business workflows. It starts by contrasting a static RAG pipeline—query, retrieve, answer—with an AI agent that interprets goals, plans multi-step processes, chooses among tools (including RAG), and iterates based on feedback until the goal is satisfied.
| Speaker | Text |
|---|---|
| Alex | Welcome back to the Deep Dive. Today we are unpacking a shift in the AI landscape that, you know, it feels almost inevitable when you look back at it. But right now in 2026, it is completely reshaping how we build software. We’re officially moving beyond the chatbot. |
| Sam | It’s about time, honestly. For the last few years, the entire world has been obsessed with AI that can talk. Write me a poem, summarize this email, uh, debug this Python script. But the real revolution, the one happening in enterprise stacks right now, is about AI that can act. |
| Alex | We’re talking about agentic AI, and just to set the stage for everyone listening, we know who you are. You aren’t beginners. You know what an LLM is. You’ve probably built R YA pipeline retrieval augmented generation where you dump a bunch of PDFs into a vector database and ask questions. You get the smart librarian phase of AI, |
| Sam | right? We aren’t here to explain what an embedding is. We’re here to bridge the gap between that simple text retrieval and what we call gentic AI. We’re talking about systems with autonomy, with persistent memory, and crucially, the ability to wield tools to execute complex workflows. |
| Alex | There’s a really fantastic line in the research notes we pulled for this deep dive that I think sums it up perfectly. It said, Rag is a pattern, but an agent is a process that controls patterns. |
| Sam | That is the core thesis right there. I mean, if you understand that distinction, you understand where the entire industry is going. So let’s |
| Alex | unpack that because I think a lot of people, when they hear AI agent, they just picture, you know, a slightly better version of chat GPT. But it’s fundamentally different under the hood, isn’t it? |
| Sam | Completely different. Let’s |
| Alex | start with the old way, and it’s just wild that we’re calling 2024 the old way. But let’s talk about the standard AG pipeline. |
| Sam | Sure. So in a standard rag workflow, you have a very, um, a very linear path. A user asks a question. The system takes that question, turns it into numbers, finds the most relevant chunks of text from a database. And then it feeds those chunks to the LLM to generate an answer. It’s query, retrieve, generate, done. Exactly. And that’s fantastic for answer this question using this specific document. But the control flow is hard coded. The AI doesn’t make any decisions about how to answer. It |
| Alex | follows the script. |
| Sam | It just follows the script. It can’t say, Wait, this document is outdated. Let me check the web instead. It can’t say this answer is ambiguous. I should ask the user for clarification. It only responds. It doesn’t act. |
| Alex | And that’s where the wall is. If you need to do something that requires multiple steps or decision making or error handling, simple rag just falls apart. It’s brittle. |
| Sam | Enter the agent. In an egentic workflow, our rag isn’t the whole system anymore. It becomes just one tool in a toolbox. The agent is a continuous loop, a loop we usually describe it as perceive, decide, act, and then remember that |
| Alex | loop is the key differentiator. Instead of a linear pipeline, it’s circular. It is. |
| Sam | Instead of rigid track, an agent looks at a goal, say, book me a flight to London under $600 and it creates a plan. It might decide, first I need to check flight prices. If it searches and finds everything is $800 a simple script would just fail or return the wrong price. It would just stop. But an agent, however, can perceive that result and make a new decision. |
| Alex | It self-corrects. It might say, OK, the goal is impossible with these constraints. I’ll check dates plus or minus 3 days. Or, |
| Sam | or I’ll message the user to ask if the budget is flexible, precisely. In the industry we call this the evaluate and repeat step. The agent iterates until the goal is satisfied. It’s the difference between a function, execution, and, well, a reasoning |
| Alex | engine. OK, so that’s the high level concept, but let’s tear the hood off. One of the critiques of AI agents is that people think it’s just magic. But it’s code. It’s all just code. The research highlights six essential components that make an agent tick. I want to walk through these because if you want to build these things, you need to understand the anatomy. Let’s do it. The first one is obvious, the foundation model, the LLM itself. But in an agent, is it playing a different role than in a chatbot? |
| Sam | Oh, absolutely. In a chatbot, the LLM is the writer. In an agent, the LLM is the router or the planner. We aren’t using it primarily to generate pros. We’re using its reasoning capabilities to interpret instructions, decompose tasks, and choose actions. It’s the CPU of the operation, |
| Alex | which leads directly to component number 2, planning and control logic. This feels like the part that was missing in the early days. |
| Sam | This is the biggest leap. Planning is the ability to break a high-level vague goal into a. Sequence of executable steps. You’ll hear terms like chain of thought reasoning here, right, |
| Alex | where the model essentially talks to itself before answering like, first I need to do X, then Y, then Z. Yes, |
| Sam | that explicit planning step is crucial because it allows the system to catch logic errors before they happen. It’s also where the agent handles flow control. If step A fails, do I retry? Do I try a different tool, or do I |
| Alex | escalate to a human? |
| Sam | Yeah, exactly. That decision tree isn’t hardcoded by a developer anymore. It’s reasoned out by the |
| Alex | model. OK, so we have a brain and a plan, but a brain in a jar can’t do anything. We need hands. That’s component 3, tool interfaces. |
| Sam | This is where things get really fun and dangerous in a good way. Tools are just connectors. In the past, an LLM could only output text. Now we give it functions. Here is a function to query a SQL database. Here is a function to send a Slack message. Here is a calculator, |
| Alex | and the agent decides when to pull the trigger on its own. |
| Sam | That’s the autonomy. You don’t script call API now. You tell the agent, Here is a tool you can use if you need it. The model figures out the context. Oh, the user is asking for a math calculation. I should use the calculator tool, and then it executes it. |
| Alex | Next up is memory, and we aren’t just talking about the context window here, are we? |
| Sam | No, not just the context window. The context window is like short term RAM. It gets wiped when the session ends. A true agent needs long-term memory, so |
| Alex | it remembers me between conversations. |
| Sam | Exactly. This is usually a vector database or a structured SQL table where it stores user preferences, past mistakes, or domain knowledge. It allows the agent to learn. If I tell it I hate aisle seats today, it needs to remember that next month. |
| Alex | Component 5 is state and policy. This sounds a bit like computer science homework, but the sources emphasize it’s critical for enterprise use. It’s |
| Sam | actually very practical. Think of state as a GPS pin. Where am I right now in this multi-step process? Have I finished step 2? Is step 3 failing? Without knowing its state, the agent just gets lost in a loop. Policy is the guardrails, the rules of the road. If you give an AI a tool to initiate refunds, you better have a policy that says do not approve anything over $50 without a human signing off. The agent checks its state against the policy constantly. It’s the safety mechanism. Finally, |
| Alex | component six, observation and feedback. |
| Sam | This is the eyes. When an agent uses a tool, say it queries a database, it has to observe the output. Did he get a 200 OK? Did he get a 404 error? Did he get a weirdly empty list? |
| Alex | A standard chatbot would just blindly trust the output and |
| Sam | hallucinate. Agents scrutinize it. If the tool fails, the observation module triggers a retry or a new plan. |
| Alex | So to recap, a brain, a plan, tools, memory, rules, and eyes. When you combine those, you get something that looks less like software and more like a Uh, like a worker, a digital worker. Yes. Let’s ground this. The theory is nice, but I want to see it in action. The notes had a great example of a refund agent. I think walking through this will show the difference between the dumb way and the agent way. Let’s |
| Sam | do it. Let’s look at a tier one customer support ticket. A user messages in, I want a refund for my subscription. OK. |
| Alex | In the dumb rag world, the 2024 world, what |
| Sam | happens? The bot takes the query refund, searches the knowledge base, finds the refund policy document, and spits out. According to our policy, refunds are available within 30 days of purchase, which |
| Alex | is technically true, but completely useless. The user doesn’t want to know the policy, they want the money, right? |
| Sam | The user ends up screaming, I know that, do IT and typing human over and over again. The bot failed because it had knowledge but no agency. |
| Alex | OK, so now let’s watch the agentic workflow. Same request. I want a refund. |
| Sam | Step one. The foundation model analyzes intent. It says the user is not asking for information. They’re asking for an action. |
| Alex | So it immediately triggers the planning module. |
| Sam | Yep, it drafts a plan. It might look something like 1, identify the user, 2, pull their transaction history. 3, check the date against the policy. 4, if valid, execute the refund. |
| Alex | And now it needs tools to do that. |
| Sam | Exactly. It calls the CRM API tool. It finds the user’s profile, then it calls the billing API. It sees the charge was $75 made 14 days ago. |
| Alex | Now here is where statement policy comes in, right? |
| Sam | That’s it. The agent checks the state. $75.14 days ago. It checks the policy. Refunds that are $100 within 30 days are auto approved. The logic passes. But let’s |
| Alex | throw a wrench in the gears. This is where the process part matters. Let’s say the billing API is down. It returns a timeout error. OK, |
| Sam | good one. A dumb bot would crash or hallucinate refund complete because it just assumes the tool work. The agent though. Utilizes its observation module. It sees the error code. It thinks, I can’t complete the action. It then modifies the plan. I will flag this ticket for a human review and tell the user there is a technical delay. So |
| Alex | it replies, I’m having trouble accessing the system right now, but I’ve filed a ticket for you. |
| Sam | Exactly. It handles the failure gracefully. It behaves like a competent junior support rep. It solves the problem end to end. It doesn’t just know the answer, it does the work. That |
| Alex | distinction between knowing the answer and doing the work, that is the billion dollar shift. But here’s where the source material gets really fascinating. We’ve been talking about one agent, the refund bot, but most complex work isn’t done by one person. It’s done by teams, |
| Sam | right? If you’re a startup, maybe one. Founder does everything, but as you scale, you hire specialists. AI is following the exact same trajectory. We are seeing the rise of multi-agent systems. Why |
| Alex | though? I mean, if GPT 6 or whatever we’re on is so smart, why can’t one God model do it all? Why do I need a team of AIs? |
| Sam | It comes down to complexity and context. If you have one massive prompt trying to be a lawyer, a coder, a writer, and a project manager all at once. The instructions just get muddy. The model gets confused. It forgets rules. It’s the |
| Alex | jack of all trades problem, master |
| Sam | of none. Exactly. And also think about contest windows. If you cram the instructions for 5 different jobs into one prompt, you’re eating up valuable memory. By splitting them up, you create specialization. You have a researcher agent, a writer agent, a coder agent, and a QA agent. |
| Alex | And the cool part from the notes is that they don’t even have to use the same brain, |
| Sam | right? Correct. Your coder agent might be running on a model specifically fine-tuned for Python. Your rudder agent might be running on a model with high creativity settings. Your QA agent might just be a cheaper, faster model checking for syntax errors. You optimize the brain for the specific job. |
| Alex | But if you have 5 agents running around, isn’t that just chaos? I mean, who’s in |
| Sam | charge? That is the new discipline, orchestration. You need a manager. An orchestrator agent that controls the flow. It decides who speaks next. |
| Alex | Is that just a linear chain? Like agent A passes to agent B who passes to C. |
| Sam | It can start that way, but the advanced systems are adaptive. The orchestrator acts like a real project manager. It might look at the researcher’s output and say, this isn’t good enough. Go back and search again. Oh. Or this is too technical. Send it to the simplifier agent. It dynamically routes the work based on the quality of the output. |
| Alex | There was one concept in the notes I really liked for coordination, the blackboard. |
| Sam | Yes, this is a classic computer science concept brought back to life. Imagine a team in a conference room with a big white board. The blackboard is a shared memory state. So |
| Alex | instead of whispering to each other, they all just write on the board for everyone to see. |
| Sam | Exactly. The researcher finds a fact and writes it on the blackboard. The writer sees it and drafts a paragraph. The compliance agent reads the paragraph and erases a sentence that violates policy. Everyone operates on a shared evolving understanding of the project. It |
| Alex | really is mimicking human organizational structure. We are recreating the department, the meeting room, and the workflow in Silicon. We |
| Sam | are, we are moving from prompt engineering, trying to whisper the right words to a God to flow engineering. Designing the org chart for your AI workers. |
| Alex | Let’s apply this to a complex example from their research, a contract analysis team. This is a task that requires very different types of thinking, so it’s perfect for a multi-agent approach, |
| Sam | right? The goal is review this vendor contract, find the risks, and rewrite the dangerous clauses. |
| Alex | If |
| Sam | I |
| Alex | threw that at a standard chatbot, I’d just get a generic summary. This contract looks standard, you know, something useless, |
| Sam | totally useless. But the multi-agent team tackles it differently. First, you have the intake agent. It’s the project lead. It reads the email, downloads the PDF, and creates the job ticket on the blackboard. It breaks the request down into tasks. Then what happens? Then the extraction agent wakes up. It’s using OCR tools. Its only job is to turn that messy PDF into structured text. It doesn’t care about legal risks. It just cares about data fidelity. |
| Alex | So it parses the document into something clean like JSON. |
| Sam | Exactly. Once the text is clean, you need the lawyer. So enter the risk analysis agent. This agent has access to a specific RJ tool containing the company’s legal playbook. |
| Alex | Ah, so it has specialized knowledge, |
| Sam | very specialized. It reads the extracted clauses and flags discrepancies. Hey, this liability cap is $1 but our policy requires $5. This jurisdiction is New York, but we prefer Delaware. It writes those flags on the blackboard. |
| Alex | OK, spotting the problem is one thing, fixing it is another. |
| Sam | That’s why you have a separate drafting agent. It sees the flags on the blackboard and proposes a red line, a specific rewrite of the legal language to fix the cap. It’s optimized for legal writing style, and |
| Alex | surely you don’t just email that out immediately. That sounds risky. |
| Sam | God, no. You have a reviewer agent or a policy enforcer. It checks the work of the drafter against the original constraints. And |
| Alex | what happens if they disagree? Let’s say the drafter fixes the liability cap but messes up the jurisdiction clause. |
| Sam | That’s handled by coordination policies. Just like in a real office, you have conflict resolution rules. The system might say if the reviewer rejects the draft twice, escalate to a human lawyer, or the risk analysis agent has veto power. |
| Alex | So the orchestrator is enforcing these policies in real time, |
| Sam | correct? And once everyone signs off, the execution agent logs into the contract life cycle management system and uploads the new version. |
| Alex | It’s a complete assembly line from start to finish. It is. |
| Sam | And think about the scalability. You can have this team reviewing 1000 contracts simultaneously, 247. It changes the economics of legal review entirely. |
| Alex | That is both incredible and slightly terrifying, but it illustrates the power of specialization. You couldn’t do that with one prompt, not reliably. |
| Sam | No, you need the checks and balances. You need the different perspectives, even if those perspectives are just different software agents. |
| Alex | So we’ve gone from the Pattern of Arrig’s simple retrieval to the process of single agents, these loops and tools, and finally to the coordination of multi-agent |
| Sam | teams. That’s the arc, and that is what 2026 is all about. |
| Alex | I want to leave our listeners with a challenge. We know a lot of you are building these systems, or at least managing teams that do. |
| Sam | So here’s a mental exercise for you. Look at a current RAG project you have. Maybe it’s a customer support bot. Maybe it’s an internal knowledge search that people are complaining about. |
| Alex | And ask yourself, how would I fire this bot and hire a team instead, |
| Sam | right? Don’t just think about getting a better answer. Think about the roles. If you were hiring humans to do this job, who would you hire? Do you need a researcher, a fact checker, a stylist? A manager, |
| Alex | sketch out the org chart, draw the meeting room. List the tools each person would need |
| Sam | because the tools to build that structure frameworks like Lang chain, Autogen, Crew AI, they’re available right now. You aren’t limited by the model anymore. You are limited by your ability to design the workflow. |
| Alex | That’s the thought for the day. We are moving from prompt engineering to flow engineering. It’s gonna |
| Sam | be a wild few years. |
| Alex | Thanks for diving deep with us. We’ll see you next time. |
Core agent components appear as a loop: a foundation model for reasoning and decisions, planning and control logic to break goals into subtasks, tool interfaces to external systems and RAG, memory for context and history, explicit state and policies expressing constraints, and observation/feedback to evaluate tool outputs and decide when to retry, adjust, or escalate. A customer support example shows how an agent not only explains refund policy but also verifies accounts, calls CRM and billing APIs, applies rules, and coordinates the end-to-end resolution.
The discussion then extends to multi-agent systems, where specialized agents play distinct roles such as Planner, Retriever, Domain Expert, Generator, and Executor. An orchestrator coordinates which agent runs when, based on shared state, using either a fixed or adaptive pattern, while a shared memory/blackboard and coordination policies govern hand-offs, conflict resolution, and stopping conditions. A contract analysis and negotiation scenario illustrates this: planner for task decomposition, extractor for clause parsing, risk analyst using RAG over internal policy, drafter for redlines, and executor for integration with enterprise systems and human review. The overall narrative shifts the perspective from single-pass Q&A toward goal-directed, tool-using, and orchestrated systems for complex business processes.
Readings
- AI Agents in Action by Micheal Lanham, Manning Publications, February 2025
- Building Applications with AI Agents by Michael Albada, O’Reilly Media, Inc., September 2025
- Building Generative AI Agents: Using LangGraph, AutoGen, and CrewAI by Tom Taulli, Gaurav Deshmukh, Apress, May 2025
- Building AI Agents with LLMs, RAG, and Knowledge Graphs by Salvatore Raieli, Gabriele Iuculano, Packt Publishing, July 2025