Transcript & Summary: 5 AI Agent Terms You Need to Know
IBM Technology
Summary
Inside an agent there is a large language model that generates text and performs reasoning, and an instruction layer wraps it to turn the model into an agent. Term number one is agents.md: a markdown file at the root of a project that the agent reads on startup to learn commands to run, setup steps, coding conventions, and formatting rules, with later files overriding earlier ones. Term number two is agent skill: a folder containing skill.md with metadata describing when to invoke the skill and the scripts or resources it needs, loading only when relevant. Term number three is MCP, the Model Context Protocol, an open protocol with an MCP server that provides a standard interface to tools and data sources so the agent can talk to Notion, Stripe, or other backends. Term number four is A2A, agent-to-agent communication, where agents publish an agent cart describing how to talk to them so other agents can delegate work. Term number five is subagents, child agents spawned by a main agent to handle a piece of work in a fresh context, enabling parallelism and returning results to the parent.
Full transcript
[00:00] Frontier AI agents, they're pretty capable. They're really good at planning out tasks and
writing code with minimal human involvement but there are a handful of specific
pieces under the hood that enable this. So let's cover five of those pieces, the
five terms you need to know about agentic AI and let's start with stuff that's inside
the agent that kind of shapes how it behaves. Inside an agent of course there is
a model, a large language model. That's what's doing the actual text
generation and the reasoning and by
[00:37] itself well it's just a conversational partner. What turns it into an agent is the instruction
layer that's wrapped around the model. So that brings us to term number one, term
number one that you need to know, that is agents.md. So what's that? Well, .md, that's markdown,
so it's just a text file. It sits at the root of a project, and whenever
the agent starts work in that project, it reads whatever is in that agent's .mdfile. Now the file tells the agent things
like which commands to run for tests
[01:19] or which coding conventions this code base uses. So we can really think of this
as being kind of like a... Readme file but it's a readme files
specifically written for agents. It tells the agents things like
specific setup commands to use and any code style rules or maybe
how a PR title should be formatted. So the agent executes the commands it finds in
agents.md when they're contextually relevant. So if the file says run PMPM test
before committing well then the agent will run PMPM test before it does a commit.
[01:54] And agents.md files can also be nested,
meaning there can be multiple of them. So maybe we have one at the root and then multiple other ones for sub-projects
with its own set of rules. And files that are closer to the working directory override the earlier ones
because they appear later. Now agents.md was introduced by OpenAI
and later contributed to the agentic AI foundation that runs under the Linux foundation. Now a quick wrinkle worth mentioning
here some agents use a different file name from agents.md so
Claude for example does this.
[02:33] Claude's one that is actually called
Claude.md because of course it is so it's different name but it's
more or less the same idea. So agents.md is read by an agent every
time it starts work in a given project. But what about knowledge that the agent only needs
sometimes and isn't necessarily project specific. So let's say the agent needs to
know how to build a PowerPoint deck. Well, loading all of that context
every single time the agent starts, that would just really clog up the
context window for no real reason
[03:11] if the task at hand has nothing
to do with PowerPoint slides. So that brings us to term
number two and term number two is agent skill so what's that well an agent skill is a folder and inside that folder is
a file that file is called skill.md. So .md again that's more markdown now also in that folder is whatever scripts or resources
the task needs and then inside skill. Md is some metadata including a description. And that tells the agent something like,
invoke me when the user wants to X.
[03:56] So X could be when the use
wants to make a PowerPoint. And if the user's request matches that
description, the agent pulls the skill in. If it doesn't match, well, the
skill is just gonna kind of sit there out of the way, not taking up any context. Agent skills are another open standard and
they're supported by multiple agent platforms. Agents.md, that's how a specific project works, and an agent skill tells the agent
how to do a specific kind of task. All right, so that's two of our five terms down.
[04:32] The agent now knows what to do, but doing
things also means reaching outside the box, as in outside the AI agent itself. So that's where we're going to go next. So agents need to reach all kinds of
external things like APIs or databases or developer tools or SaaS platforms you name it. And the challenge here is that every one of
those targets might have its own interface. So without some kind of standard
every AI agent would need a custom connector for every external thing
it might touch which would be a mess.
[05:05] So that brings us to term number three, MCP - Model Context Protocol. Now MCP is an open protocol for connecting
AI applications to tools and data sources and workflows and it comes with
something called an MCP server. Now an MCP server wraps up a tool or
a data source in a standard interface and any agent that can speak
MCP can now talk to that tool. So let's say an agent needs to pull data
from it needs to go to something in Notion. So we've got Notion here, or maybe it needs to go
a Stripe payment link, whatever the backend is.
[05:52] Well, the agent speaks MCP to the server and it's the server now that handles the
underlying API for in this case, Notion. Now, MCP started at Anthropic
and is now governed under the AAIF, again at the Linux foundation. And it has broad industry support. So that covers agents talking to tools and data. What about agents talking other agents? Well, time for term number four. That is A2A. Otherwise known as agent to agent. So A2A is an open protocol for
agent to agent communication. So let's kind of think of
a scenario for using this.
[06:39] Let's say we've got a procurement agent
here and that handles vendor contracts. And then maybe we've also got a finance
agent over here and that approves spend. And yeah, I know financial
processing stuff, trying to. Contain your excitement but the the
procurement agent needs to negotiate a contract and then it needs to hand off
to the finance for approval and without A2A these two agents would
need some form of custom integration or they wouldn't really coordinate very well but
with A2A each agent publishes something called an
[07:20] agent cart. And that's just basically a description
of what the agent does and how to talk to it. And other agents can read that card and
then figure out how to delegate work. The procurement agent in this case
is going to find the agent card and read it for the finance agent
and then hand off the contract. So that's A2A and this A2A
standard comes from Google. It's now also an open standard under,
you guessed it, the Linux foundation. So MCP is how agents talk to tools and data
and A2A is how agent's talk to each other.
[08:06] All right, so how we're doing here, now the agent knows what to do and it
knows how to reach outside of its borders. What else? Well, sometimes one agent just isn't enough. Maybe the task is too big for one context window, so say the agent's reviewing a code base
with thousands of files loading every file, that would blow out the context on its own. Or maybe the work is embarrassingly parallel, like you've got to run a check on 20 different
functions and each check is independent, and you could do those one
at a time but that's slow,
[08:44] doing them all at once would be 20 times faster. So, term number five that you need to know. It's subagents, which means using
and spawning multiple agents. So a subagent is a child agent that the main
agent spawns to do a specific piece of work and each sub agent runs in
its own fresh context window, it does its job and it
returns a result when it is done . So this main agent here, it could
spawn a sub agent and give it some work to do. Let's say go read 500 files, and then just kind of hand back to the
main agent a summary of those files.
[09:33] So that would keep the main agents
context window pretty clean. And we could have lots of agents in
parallel, maybe we've got like 20 agents here running in parallel handling
20 independent checks at the same time. Now, sub agents are a little
bit different from the other four terms because sub agents
are a common pattern in modern agent systems but they don't really have
a formal standard document behind them. But the concept shows up
almost identically everywhere. I mean the very basic idea is you
have this big parent agent here.
[10:09] That parent agent spawns one or more child agents. The child gets the same context. The child does whatever work it was told to do and then it returns a result
and the parent carries on. With its context intact. So there we've got five terms. We've got agents.md and agent skills, which live
inside the agent and they shape how it behaves. We've go MCP and we've go A2A. That's how the agent reaches outwards
to tools and to other agents. And we've gone sub agents. That's the agent handles the work
that doesn't fit into one context.
[10:50] That's what a front-end AI
agent actually looks like under the hood today.
Follow IBM Technology on Essently — get a summary of each new video by email.
Subscribe to this channel