📝 Essently

Transcript & Summary: 5 AI Agent Terms You Need to Know

IBM Technology

Summary

Inside an agent there is a large language model that generates text and performs reasoning, and an instruction layer wraps it to turn the model into an agent. Term number one is agents.md: a markdown file at the root of a project that the agent reads on startup to learn commands to run, setup steps, coding conventions, and formatting rules, with later files overriding earlier ones. Term number two is agent skill: a folder containing skill.md with metadata describing when to invoke the skill and the scripts or resources it needs, loading only when relevant. Term number three is MCP, the Model Context Protocol, an open protocol with an MCP server that provides a standard interface to tools and data sources so the agent can talk to Notion, Stripe, or other backends. Term number four is A2A, agent-to-agent communication, where agents publish an agent cart describing how to talk to them so other agents can delegate work. Term number five is subagents, child agents spawned by a main agent to handle a piece of work in a fresh context, enabling parallelism and returning results to the parent.

Full transcript

Download .txt
[00:00] Frontier AI agents, they're pretty capable. They're really good at planning out tasks and  writing code with minimal human involvement   but there are a handful of specific  pieces under the hood that enable this. So let's cover five of those pieces, the  five terms you need to know about agentic   AI and let's start with stuff that's inside  the agent that kind of shapes how it behaves. Inside an agent of course there is  a model, a large language model. That's what's doing the actual text  generation and the reasoning and by   [00:37] itself well it's just a conversational partner. What turns it into an agent is the instruction  layer that's wrapped around the model. So that brings us to term number one, term  number one that you need to know, that is agents.md. So what's that? Well, .md, that's markdown,  so it's just a text file. It sits at the root of a project, and whenever  the agent starts work in that project,   it reads whatever is in that agent's .mdfile. Now the file tells the agent things  like which commands to run for tests   [01:19] or which coding conventions this code base uses. So we can really think of this  as being kind of like a... Readme file but it's a readme files  specifically written for agents. It tells the agents things like  specific setup commands to use   and any code style rules or maybe  how a PR title should be formatted. So the agent executes the commands it finds in  agents.md when they're contextually relevant. So if the file says run PMPM test  before committing well then the   agent will run PMPM test before it does a commit. [01:54] And agents.md files can also be nested,  meaning there can be multiple of them. So maybe we have one at the root and then multiple   other ones for sub-projects  with its own set of rules. And files that are closer to the working directory   override the earlier ones  because they appear later. Now agents.md was introduced by OpenAI  and later contributed to the agentic AI foundation that runs under the Linux foundation. Now a quick wrinkle worth mentioning  here some agents use a different   file name from agents.md so  Claude for example does this. [02:33] Claude's one that is actually called  Claude.md because of course it is so   it's different name but it's  more or less the same idea. So agents.md is read by an agent every  time it starts work in a given project. But what about knowledge that the agent only needs  sometimes and isn't necessarily project specific. So let's say the agent needs to  know how to build a PowerPoint deck. Well, loading all of that context  every single time the agent starts, that would just really clog up the  context window for no real reason   [03:11] if the task at hand has nothing  to do with PowerPoint slides. So that brings us to term  number two and term number two is agent skill so what's that well an agent skill is   a folder and inside that folder is  a file that file is called skill.md. So .md again that's more markdown now also in that   folder is whatever scripts or resources  the task needs and then inside skill. Md is some metadata including a description. And that tells the agent something like,  invoke me when the user wants to X. [03:56] So X could be when the use  wants to make a PowerPoint. And if the user's request matches that  description, the agent pulls the skill in. If it doesn't match, well, the  skill is just gonna kind of sit   there out of the way, not taking up any context. Agent skills are another open standard and  they're supported by multiple agent platforms. Agents.md, that's how a specific project works,   and an agent skill tells the agent  how to do a specific kind of task. All right, so that's two of our five terms down. [04:32] The agent now knows what to do, but doing  things also means reaching outside the box,   as in outside the AI agent itself. So that's where we're going to go next. So agents need to reach all kinds of  external things like APIs or databases   or developer tools or SaaS platforms you name it. And the challenge here is that every one of  those targets might have its own interface. So without some kind of standard  every AI agent would need a custom   connector for every external thing  it might touch which would be a mess. [05:05] So that brings us to term number three, MCP - Model Context Protocol. Now MCP is an open protocol for connecting  AI applications to tools and data sources   and workflows and it comes with  something called an MCP server. Now an MCP server wraps up a tool or  a data source in a standard interface   and any agent that can speak  MCP can now talk to that tool. So let's say an agent needs to pull data  from it needs to go to something in Notion. So we've got Notion here, or maybe it needs to go  a Stripe payment link, whatever the backend is. [05:52] Well, the agent speaks MCP to the server and it's   the server now that handles the  underlying API for in this case, Notion. Now, MCP started at Anthropic  and is now governed under the AAIF,   again at the Linux foundation. And it has broad industry support. So that covers agents talking to tools and data. What about agents talking other agents? Well, time for term number four. That is A2A. Otherwise known as agent to agent. So A2A is an open protocol for  agent to agent communication. So let's kind of think of  a scenario for using this. [06:39] Let's say we've got a procurement agent  here and that handles vendor contracts. And then maybe we've also got a finance  agent over here and that approves spend. And yeah, I know financial  processing stuff, trying to. Contain your excitement but the the  procurement agent needs to negotiate a contract and then it needs to hand off  to the finance for approval and   without A2A these two agents would  need some form of custom integration or they wouldn't really coordinate very well but  with A2A each agent publishes something called an   [07:20] agent cart. And that's just basically a description  of what the agent does and how to talk to it. And other agents can read that card and  then figure out how to delegate work. The procurement agent in this case  is going to find the agent card and   read it for the finance agent  and then hand off the contract. So that's A2A and this A2A  standard comes from Google. It's now also an open standard under,  you guessed it, the Linux foundation. So MCP is how agents talk to tools and data  and A2A is how agent's talk to each other. [08:06] All right, so how we're doing here,   now the agent knows what to do and it  knows how to reach outside of its borders. What else? Well, sometimes one agent just isn't enough. Maybe the task is too big for one context window, so say the agent's reviewing a code base  with thousands of files loading every file,   that would blow out the context on its own. Or maybe the work is embarrassingly parallel,   like you've got to run a check on 20 different  functions and each check is independent, and you could do those one  at a time but that's slow,   [08:44] doing them all at once would be 20 times faster. So, term number five that you need to know. It's subagents, which means using  and spawning multiple agents. So a subagent is a child agent that the main  agent spawns to do a specific piece of work and each sub agent runs in  its own fresh context window,   it does its job and it  returns a result when it is done . So this main agent here, it could  spawn a sub agent and give it some work to do. Let's say go read 500 files,   and then just kind of hand back to the  main agent a summary of those files. [09:33] So that would keep the main agents  context window pretty clean. And we could have lots of agents in  parallel, maybe we've got like 20   agents here running in parallel handling  20 independent checks at the same time. Now, sub agents are a little  bit different from the other   four terms because sub agents  are a common pattern in modern agent systems but they don't really have  a formal standard document behind them. But the concept shows up  almost identically everywhere. I mean the very basic idea is you  have this big parent agent here. [10:09] That parent agent spawns one or more child agents. The child gets the same context. The child does whatever work it was told to do   and then it returns a result  and the parent carries on. With its context intact. So there we've got five terms. We've got agents.md and agent skills, which live  inside the agent and they shape how it behaves. We've go MCP and we've go A2A. That's how the agent reaches outwards  to tools and to other agents. And we've gone sub agents. That's the agent handles the work  that doesn't fit into one context. [10:50] That's what a front-end AI  agent actually looks like under the hood today.

Follow IBM Technology on Essently — get a summary of each new video by email.

Subscribe to this channel