Agent Systems

The session frames agentic AI as an evolution of basic RAG for business workflows. It starts by contrasting a static RAG pipeline—query, retrieve, answer—with an AI agent that interprets goals, plans multi-step processes, chooses among tools (including RAG), and iterates based on feedback until the goal is satisfied.

From Prompt Engineering to Flow Engineering

Transcript

Speaker	Text
Alex	Welcome back to the Deep Dive. Today we are unpacking a shift in the AI landscape that, you know, it feels almost inevitable when you look back at it. But right now in 2026, it is completely reshaping how we build software. We’re officially moving beyond the chatbot.
Sam	It’s about time, honestly. For the last few years, the entire world has been obsessed with AI that can talk. Write me a poem, summarize this email, uh, debug this Python script. But the real revolution, the one happening in enterprise stacks right now, is about AI that can act.
Alex	We’re talking about agentic AI, and just to set the stage for everyone listening, we know who you are. You aren’t beginners. You know what an LLM is. You’ve probably built R YA pipeline retrieval augmented generation where you dump a bunch of PDFs into a vector database and ask questions. You get the smart librarian phase of AI,
Sam	right? We aren’t here to explain what an embedding is. We’re here to bridge the gap between that simple text retrieval and what we call gentic AI. We’re talking about systems with autonomy, with persistent memory, and crucially, the ability to wield tools to execute complex workflows.
Alex	There’s a really fantastic line in the research notes we pulled for this deep dive that I think sums it up perfectly. It said, Rag is a pattern, but an agent is a process that controls patterns.
Sam	That is the core thesis right there. I mean, if you understand that distinction, you understand where the entire industry is going. So let’s
Alex	unpack that because I think a lot of people, when they hear AI agent, they just picture, you know, a slightly better version of chat GPT. But it’s fundamentally different under the hood, isn’t it?
Sam	Completely different. Let’s
Alex	start with the old way, and it’s just wild that we’re calling 2024 the old way. But let’s talk about the standard AG pipeline.
Sam	Sure. So in a standard rag workflow, you have a very, um, a very linear path. A user asks a question. The system takes that question, turns it into numbers, finds the most relevant chunks of text from a database. And then it feeds those chunks to the LLM to generate an answer. It’s query, retrieve, generate, done. Exactly. And that’s fantastic for answer this question using this specific document. But the control flow is hard coded. The AI doesn’t make any decisions about how to answer. It
Alex	follows the script.
Sam	It just follows the script. It can’t say, Wait, this document is outdated. Let me check the web instead. It can’t say this answer is ambiguous. I should ask the user for clarification. It only responds. It doesn’t act.
Alex	And that’s where the wall is. If you need to do something that requires multiple steps or decision making or error handling, simple rag just falls apart. It’s brittle.
Sam	Enter the agent. In an egentic workflow, our rag isn’t the whole system anymore. It becomes just one tool in a toolbox. The agent is a continuous loop, a loop we usually describe it as perceive, decide, act, and then remember that
Alex	loop is the key differentiator. Instead of a linear pipeline, it’s circular. It is.
Sam	Instead of rigid track, an agent looks at a goal, say, book me a flight to London under $600 and it creates a plan. It might decide, first I need to check flight prices. If it searches and finds everything is $800 a simple script would just fail or return the wrong price. It would just stop. But an agent, however, can perceive that result and make a new decision.
Alex	It self-corrects. It might say, OK, the goal is impossible with these constraints. I’ll check dates plus or minus 3 days. Or,
Sam	or I’ll message the user to ask if the budget is flexible, precisely. In the industry we call this the evaluate and repeat step. The agent iterates until the goal is satisfied. It’s the difference between a function, execution, and, well, a reasoning
Alex	engine. OK, so that’s the high level concept, but let’s tear the hood off. One of the critiques of AI agents is that people think it’s just magic. But it’s code. It’s all just code. The research highlights six essential components that make an agent tick. I want to walk through these because if you want to build these things, you need to understand the anatomy. Let’s do it. The first one is obvious, the foundation model, the LLM itself. But in an agent, is it playing a different role than in a chatbot?
Sam	Oh, absolutely. In a chatbot, the LLM is the writer. In an agent, the LLM is the router or the planner. We aren’t using it primarily to generate pros. We’re using its reasoning capabilities to interpret instructions, decompose tasks, and choose actions. It’s the CPU of the operation,
Alex	which leads directly to component number 2, planning and control logic. This feels like the part that was missing in the early days.
Sam	This is the biggest leap. Planning is the ability to break a high-level vague goal into a. Sequence of executable steps. You’ll hear terms like chain of thought reasoning here, right,
Alex	where the model essentially talks to itself before answering like, first I need to do X, then Y, then Z. Yes,
Sam	that explicit planning step is crucial because it allows the system to catch logic errors before they happen. It’s also where the agent handles flow control. If step A fails, do I retry? Do I try a different tool, or do I
Alex	escalate to a human?
Sam	Yeah, exactly. That decision tree isn’t hardcoded by a developer anymore. It’s reasoned out by the
Alex	model. OK, so we have a brain and a plan, but a brain in a jar can’t do anything. We need hands. That’s component 3, tool interfaces.
Sam	This is where things get really fun and dangerous in a good way. Tools are just connectors. In the past, an LLM could only output text. Now we give it functions. Here is a function to query a SQL database. Here is a function to send a Slack message. Here is a calculator,
Alex	and the agent decides when to pull the trigger on its own.
Sam	That’s the autonomy. You don’t script call API now. You tell the agent, Here is a tool you can use if you need it. The model figures out the context. Oh, the user is asking for a math calculation. I should use the calculator tool, and then it executes it.
Alex	Next up is memory, and we aren’t just talking about the context window here, are we?
Sam	No, not just the context window. The context window is like short term RAM. It gets wiped when the session ends. A true agent needs long-term memory, so
Alex	it remembers me between conversations.
Sam	Exactly. This is usually a vector database or a structured SQL table where it stores user preferences, past mistakes, or domain knowledge. It allows the agent to learn. If I tell it I hate aisle seats today, it needs to remember that next month.
Alex	Component 5 is state and policy. This sounds a bit like computer science homework, but the sources emphasize it’s critical for enterprise use. It’s
Sam	actually very practical. Think of state as a GPS pin. Where am I right now in this multi-step process? Have I finished step 2? Is step 3 failing? Without knowing its state, the agent just gets lost in a loop. Policy is the guardrails, the rules of the road. If you give an AI a tool to initiate refunds, you better have a policy that says do not approve anything over $50 without a human signing off. The agent checks its state against the policy constantly. It’s the safety mechanism. Finally,
Alex	component six, observation and feedback.
Sam	This is the eyes. When an agent uses a tool, say it queries a database, it has to observe the output. Did he get a 200 OK? Did he get a 404 error? Did he get a weirdly empty list?
Alex	A standard chatbot would just blindly trust the output and
Sam	hallucinate. Agents scrutinize it. If the tool fails, the observation module triggers a retry or a new plan.
Alex	So to recap, a brain, a plan, tools, memory, rules, and eyes. When you combine those, you get something that looks less like software and more like a Uh, like a worker, a digital worker. Yes. Let’s ground this. The theory is nice, but I want to see it in action. The notes had a great example of a refund agent. I think walking through this will show the difference between the dumb way and the agent way. Let’s
Sam	do it. Let’s look at a tier one customer support ticket. A user messages in, I want a refund for my subscription. OK.
Alex	In the dumb rag world, the 2024 world, what
Sam	happens? The bot takes the query refund, searches the knowledge base, finds the refund policy document, and spits out. According to our policy, refunds are available within 30 days of purchase, which
Alex	is technically true, but completely useless. The user doesn’t want to know the policy, they want the money, right?
Sam	The user ends up screaming, I know that, do IT and typing human over and over again. The bot failed because it had knowledge but no agency.
Alex	OK, so now let’s watch the agentic workflow. Same request. I want a refund.
Sam	Step one. The foundation model analyzes intent. It says the user is not asking for information. They’re asking for an action.
Alex	So it immediately triggers the planning module.
Sam	Yep, it drafts a plan. It might look something like 1, identify the user, 2, pull their transaction history. 3, check the date against the policy. 4, if valid, execute the refund.
Alex	And now it needs tools to do that.
Sam	Exactly. It calls the CRM API tool. It finds the user’s profile, then it calls the billing API. It sees the charge was $75 made 14 days ago.
Alex	Now here is where statement policy comes in, right?
Sam	That’s it. The agent checks the state. $75.14 days ago. It checks the policy. Refunds that are $100 within 30 days are auto approved. The logic passes. But let’s
Alex	throw a wrench in the gears. This is where the process part matters. Let’s say the billing API is down. It returns a timeout error. OK,
Sam	good one. A dumb bot would crash or hallucinate refund complete because it just assumes the tool work. The agent though. Utilizes its observation module. It sees the error code. It thinks, I can’t complete the action. It then modifies the plan. I will flag this ticket for a human review and tell the user there is a technical delay. So
Alex	it replies, I’m having trouble accessing the system right now, but I’ve filed a ticket for you.
Sam	Exactly. It handles the failure gracefully. It behaves like a competent junior support rep. It solves the problem end to end. It doesn’t just know the answer, it does the work. That
Alex	distinction between knowing the answer and doing the work, that is the billion dollar shift. But here’s where the source material gets really fascinating. We’ve been talking about one agent, the refund bot, but most complex work isn’t done by one person. It’s done by teams,
Sam	right? If you’re a startup, maybe one. Founder does everything, but as you scale, you hire specialists. AI is following the exact same trajectory. We are seeing the rise of multi-agent systems. Why
Alex	though? I mean, if GPT 6 or whatever we’re on is so smart, why can’t one God model do it all? Why do I need a team of AIs?
Sam	It comes down to complexity and context. If you have one massive prompt trying to be a lawyer, a coder, a writer, and a project manager all at once. The instructions just get muddy. The model gets confused. It forgets rules. It’s the
Alex	jack of all trades problem, master
Sam	of none. Exactly. And also think about contest windows. If you cram the instructions for 5 different jobs into one prompt, you’re eating up valuable memory. By splitting them up, you create specialization. You have a researcher agent, a writer agent, a coder agent, and a QA agent.
Alex	And the cool part from the notes is that they don’t even have to use the same brain,
Sam	right? Correct. Your coder agent might be running on a model specifically fine-tuned for Python. Your rudder agent might be running on a model with high creativity settings. Your QA agent might just be a cheaper, faster model checking for syntax errors. You optimize the brain for the specific job.
Alex	But if you have 5 agents running around, isn’t that just chaos? I mean, who’s in
Sam	charge? That is the new discipline, orchestration. You need a manager. An orchestrator agent that controls the flow. It decides who speaks next.
Alex	Is that just a linear chain? Like agent A passes to agent B who passes to C.
Sam	It can start that way, but the advanced systems are adaptive. The orchestrator acts like a real project manager. It might look at the researcher’s output and say, this isn’t good enough. Go back and search again. Oh. Or this is too technical. Send it to the simplifier agent. It dynamically routes the work based on the quality of the output.
Alex	There was one concept in the notes I really liked for coordination, the blackboard.
Sam	Yes, this is a classic computer science concept brought back to life. Imagine a team in a conference room with a big white board. The blackboard is a shared memory state. So
Alex	instead of whispering to each other, they all just write on the board for everyone to see.
Sam	Exactly. The researcher finds a fact and writes it on the blackboard. The writer sees it and drafts a paragraph. The compliance agent reads the paragraph and erases a sentence that violates policy. Everyone operates on a shared evolving understanding of the project. It
Alex	really is mimicking human organizational structure. We are recreating the department, the meeting room, and the workflow in Silicon. We
Sam	are, we are moving from prompt engineering, trying to whisper the right words to a God to flow engineering. Designing the org chart for your AI workers.
Alex	Let’s apply this to a complex example from their research, a contract analysis team. This is a task that requires very different types of thinking, so it’s perfect for a multi-agent approach,
Sam	right? The goal is review this vendor contract, find the risks, and rewrite the dangerous clauses.
Alex	If
Sam	I
Alex	threw that at a standard chatbot, I’d just get a generic summary. This contract looks standard, you know, something useless,
Sam	totally useless. But the multi-agent team tackles it differently. First, you have the intake agent. It’s the project lead. It reads the email, downloads the PDF, and creates the job ticket on the blackboard. It breaks the request down into tasks. Then what happens? Then the extraction agent wakes up. It’s using OCR tools. Its only job is to turn that messy PDF into structured text. It doesn’t care about legal risks. It just cares about data fidelity.
Alex	So it parses the document into something clean like JSON.
Sam	Exactly. Once the text is clean, you need the lawyer. So enter the risk analysis agent. This agent has access to a specific RJ tool containing the company’s legal playbook.
Alex	Ah, so it has specialized knowledge,
Sam	very specialized. It reads the extracted clauses and flags discrepancies. Hey, this liability cap is $1 but our policy requires $5. This jurisdiction is New York, but we prefer Delaware. It writes those flags on the blackboard.
Alex	OK, spotting the problem is one thing, fixing it is another.
Sam	That’s why you have a separate drafting agent. It sees the flags on the blackboard and proposes a red line, a specific rewrite of the legal language to fix the cap. It’s optimized for legal writing style, and
Alex	surely you don’t just email that out immediately. That sounds risky.
Sam	God, no. You have a reviewer agent or a policy enforcer. It checks the work of the drafter against the original constraints. And
Alex	what happens if they disagree? Let’s say the drafter fixes the liability cap but messes up the jurisdiction clause.
Sam	That’s handled by coordination policies. Just like in a real office, you have conflict resolution rules. The system might say if the reviewer rejects the draft twice, escalate to a human lawyer, or the risk analysis agent has veto power.
Alex	So the orchestrator is enforcing these policies in real time,
Sam	correct? And once everyone signs off, the execution agent logs into the contract life cycle management system and uploads the new version.
Alex	It’s a complete assembly line from start to finish. It is.
Sam	And think about the scalability. You can have this team reviewing 1000 contracts simultaneously, 247. It changes the economics of legal review entirely.
Alex	That is both incredible and slightly terrifying, but it illustrates the power of specialization. You couldn’t do that with one prompt, not reliably.
Sam	No, you need the checks and balances. You need the different perspectives, even if those perspectives are just different software agents.
Alex	So we’ve gone from the Pattern of Arrig’s simple retrieval to the process of single agents, these loops and tools, and finally to the coordination of multi-agent
Sam	teams. That’s the arc, and that is what 2026 is all about.
Alex	I want to leave our listeners with a challenge. We know a lot of you are building these systems, or at least managing teams that do.
Sam	So here’s a mental exercise for you. Look at a current RAG project you have. Maybe it’s a customer support bot. Maybe it’s an internal knowledge search that people are complaining about.
Alex	And ask yourself, how would I fire this bot and hire a team instead,
Sam	right? Don’t just think about getting a better answer. Think about the roles. If you were hiring humans to do this job, who would you hire? Do you need a researcher, a fact checker, a stylist? A manager,
Alex	sketch out the org chart, draw the meeting room. List the tools each person would need
Sam	because the tools to build that structure frameworks like Lang chain, Autogen, Crew AI, they’re available right now. You aren’t limited by the model anymore. You are limited by your ability to design the workflow.
Alex	That’s the thought for the day. We are moving from prompt engineering to flow engineering. It’s gonna
Sam	be a wild few years.
Alex	Thanks for diving deep with us. We’ll see you next time.

Core agent components appear as a loop: a foundation model for reasoning and decisions, planning and control logic to break goals into subtasks, tool interfaces to external systems and RAG, memory for context and history, explicit state and policies expressing constraints, and observation/feedback to evaluate tool outputs and decide when to retry, adjust, or escalate. A customer support example shows how an agent not only explains refund policy but also verifies accounts, calls CRM and billing APIs, applies rules, and coordinates the end-to-end resolution.

The discussion then extends to multi-agent systems, where specialized agents play distinct roles such as Planner, Retriever, Domain Expert, Generator, and Executor. An orchestrator coordinates which agent runs when, based on shared state, using either a fixed or adaptive pattern, while a shared memory/blackboard and coordination policies govern hand-offs, conflict resolution, and stopping conditions. A contract analysis and negotiation scenario illustrates this: planner for task decomposition, extractor for clause parsing, risk analyst using RAG over internal policy, drafter for redlines, and executor for integration with enterprise systems and human review. The overall narrative shifts the perspective from single-pass Q&A toward goal-directed, tool-using, and orchestrated systems for complex business processes.

Readings

AI Agents in Action by Micheal Lanham, Manning Publications, February 2025
Building Applications with AI Agents by Michael Albada, O’Reilly Media, Inc., September 2025
Building Generative AI Agents: Using LangGraph, AutoGen, and CrewAI by Tom Taulli, Gaurav Deshmukh, Apress, May 2025
Building AI Agents with LLMs, RAG, and Knowledge Graphs by Salvatore Raieli, Gabriele Iuculano, Packt Publishing, July 2025