Transcript & Summary: 5 AI Agent Terms You Need to Know

IBM Technology

Summary

Inside an agent there is a large language model that generates text and performs reasoning, and an instruction layer wraps it to turn the model into an agent. Term number one is agents.md: a markdown file at the root of a project that the agent reads on startup to learn commands to run, setup steps, coding conventions, and formatting rules, with later files overriding earlier ones. Term number two is agent skill: a folder containing skill.md with metadata describing when to invoke the skill and the scripts or resources it needs, loading only when relevant. Term number three is MCP, the Model Context Protocol, an open protocol with an MCP server that provides a standard interface to tools and data sources so the agent can talk to Notion, Stripe, or other backends. Term number four is A2A, agent-to-agent communication, where agents publish an agent cart describing how to talk to them so other agents can delegate work. Term number five is subagents, child agents spawned by a main agent to handle a piece of work in a fresh context, enabling parallelism and returning results to the parent.

Full transcript

Download .txt

[00:00] Frontier AI agents, they're pretty capable. They're really good at planning out tasks and writing code with minimal human involvement but there are a handful of specific pieces under the hood that enable this. So let's cover five of those pieces, the five terms you need to know about agentic AI and let's start with stuff that's inside the agent that kind of shapes how it behaves. Inside an agent of course there is a model, a large language model. That's what's doing the actual text generation and the reasoning and by [00:37] itself well it's just a conversational partner. What turns it into an agent is the instruction layer that's wrapped around the model. So that brings us to term number one, term number one that you need to know, that is agents.md. So what's that? Well, .md, that's markdown, so it's just a text file. It sits at the root of a project, and whenever the agent starts work in that project, it reads whatever is in that agent's .mdfile. Now the file tells the agent things like which commands to run for tests [01:19] or which coding conventions this code base uses. So we can really think of this as being kind of like a... Readme file but it's a readme files specifically written for agents. It tells the agents things like specific setup commands to use and any code style rules or maybe how a PR title should be formatted. So the agent executes the commands it finds in agents.md when they're contextually relevant. So if the file says run PMPM test before committing well then the agent will run PMPM test before it does a commit. [01:54] And agents.md files can also be nested, meaning there can be multiple of them. So maybe we have one at the root and then multiple other ones for sub-projects with its own set of rules. And files that are closer to the working directory override the earlier ones because they appear later. Now agents.md was introduced by OpenAI and later contributed to the agentic AI foundation that runs under the Linux foundation. Now a quick wrinkle worth mentioning here some agents use a different file name from agents.md so Claude for example does this. [02:33] Claude's one that is actually called Claude.md because of course it is so it's different name but it's more or less the same idea. So agents.md is read by an agent every time it starts work in a given project. But what about knowledge that the agent only needs sometimes and isn't necessarily project specific. So let's say the agent needs to know how to build a PowerPoint deck. Well, loading all of that context every single time the agent starts, that would just really clog up the context window for no real reason [03:11] if the task at hand has nothing to do with PowerPoint slides. So that brings us to term number two and term number two is agent skill so what's that well an agent skill is a folder and inside that folder is a file that file is called skill.md. So .md again that's more markdown now also in that folder is whatever scripts or resources the task needs and then inside skill. Md is some metadata including a description. And that tells the agent something like, invoke me when the user wants to X. [03:56] So X could be when the use wants to make a PowerPoint. And if the user's request matches that description, the agent pulls the skill in. If it doesn't match, well, the skill is just gonna kind of sit there out of the way, not taking up any context. Agent skills are another open standard and they're supported by multiple agent platforms. Agents.md, that's how a specific project works, and an agent skill tells the agent how to do a specific kind of task. All right, so that's two of our five terms down. [04:32] The agent now knows what to do, but doing things also means reaching outside the box, as in outside the AI agent itself. So that's where we're going to go next. So agents need to reach all kinds of external things like APIs or databases or developer tools or SaaS platforms you name it. And the challenge here is that every one of those targets might have its own interface. So without some kind of standard every AI agent would need a custom connector for every external thing it might touch which would be a mess. [05:05] So that brings us to term number three, MCP - Model Context Protocol. Now MCP is an open protocol for connecting AI applications to tools and data sources and workflows and it comes with something called an MCP server. Now an MCP server wraps up a tool or a data source in a standard interface and any agent that can speak MCP can now talk to that tool. So let's say an agent needs to pull data from it needs to go to something in Notion. So we've got Notion here, or maybe it needs to go a Stripe payment link, whatever the backend is. [05:52] Well, the agent speaks MCP to the server and it's the server now that handles the underlying API for in this case, Notion. Now, MCP started at Anthropic and is now governed under the AAIF, again at the Linux foundation. And it has broad industry support. So that covers agents talking to tools and data. What about agents talking other agents? Well, time for term number four. That is A2A. Otherwise known as agent to agent. So A2A is an open protocol for agent to agent communication. So let's kind of think of a scenario for using this. [06:39] Let's say we've got a procurement agent here and that handles vendor contracts. And then maybe we've also got a finance agent over here and that approves spend. And yeah, I know financial processing stuff, trying to. Contain your excitement but the the procurement agent needs to negotiate a contract and then it needs to hand off to the finance for approval and without A2A these two agents would need some form of custom integration or they wouldn't really coordinate very well but with A2A each agent publishes something called an [07:20] agent cart. And that's just basically a description of what the agent does and how to talk to it. And other agents can read that card and then figure out how to delegate work. The procurement agent in this case is going to find the agent card and read it for the finance agent and then hand off the contract. So that's A2A and this A2A standard comes from Google. It's now also an open standard under, you guessed it, the Linux foundation. So MCP is how agents talk to tools and data and A2A is how agent's talk to each other. [08:06] All right, so how we're doing here, now the agent knows what to do and it knows how to reach outside of its borders. What else? Well, sometimes one agent just isn't enough. Maybe the task is too big for one context window, so say the agent's reviewing a code base with thousands of files loading every file, that would blow out the context on its own. Or maybe the work is embarrassingly parallel, like you've got to run a check on 20 different functions and each check is independent, and you could do those one at a time but that's slow, [08:44] doing them all at once would be 20 times faster. So, term number five that you need to know. It's subagents, which means using and spawning multiple agents. So a subagent is a child agent that the main agent spawns to do a specific piece of work and each sub agent runs in its own fresh context window, it does its job and it returns a result when it is done . So this main agent here, it could spawn a sub agent and give it some work to do. Let's say go read 500 files, and then just kind of hand back to the main agent a summary of those files. [09:33] So that would keep the main agents context window pretty clean. And we could have lots of agents in parallel, maybe we've got like 20 agents here running in parallel handling 20 independent checks at the same time. Now, sub agents are a little bit different from the other four terms because sub agents are a common pattern in modern agent systems but they don't really have a formal standard document behind them. But the concept shows up almost identically everywhere. I mean the very basic idea is you have this big parent agent here. [10:09] That parent agent spawns one or more child agents. The child gets the same context. The child does whatever work it was told to do and then it returns a result and the parent carries on. With its context intact. So there we've got five terms. We've got agents.md and agent skills, which live inside the agent and they shape how it behaves. We've go MCP and we've go A2A. That's how the agent reaches outwards to tools and to other agents. And we've gone sub agents. That's the agent handles the work that doesn't fit into one context. [10:50] That's what a front-end AI agent actually looks like under the hood today.

Follow IBM Technology on Essently — get a summary of each new video by email.

Subscribe to this channel

Summarize another video