AI. AI. AI. AI. AI. AI. You know, more agentic. Agentic capabilities. An AI agent. Agents. Agentic workflows. Agents. Agents. Agent. Agent. Agent. Agent. Agentic.
All right. Most explanations of AI agents are either too technical or too basic. This article is meant for people like myself. You have zero technical background, but you use AI tools regularly and you want to learn just enough about AI agents to see how it affects you.
In this article, we’ll follow a simple one, two, three learning path by building on concepts you already understand, like ChatGPT, and then moving on to AI workflows, and then finally AI agents. All the while using examples you will actually encounter in real life. And believe me when I tell you, those intimidating terms you see everywhere, like RAG, or ReAct, they’re a lot simpler than you think.
Let’s get started.
Kicking things off at level one: large language models.
Popular AI chatbots like ChatGPT, Google Gemini, and Claude are applications built on top of large language models, or LLMs, and they’re fantastic at generating and editing text. Here’s a simple visualization. You, the human, provide an input and the LLM produces an output based on its training data.
For example, if I were to ask ChatGPT to draft an email requesting a coffee chat, my prompt is the input and the resulting email—that’s way more polite than I would ever be in real life—is the output. So far so good, right? Simple stuff.
But what if I asked ChatGPT when my next coffee chat is? Even without seeing the response, both you and I know ChatGPT is gonna fail because it doesn’t know that information. It doesn’t have access to my calendar.
This highlights two key traits of large language models. First, despite being trained on vast amounts of data, they have limited knowledge of proprietary information like our personal information or internal company data. Second, LLMs are passive. They wait for our prompt and then respond. Right? Keep these two traits in mind moving forward.
Moving to level two: AI workflows.
Let’s build on our example. What if I, a human, told the LLM, “Every time I ask about a personal event, perform a search query and fetch data from my Google calendar before providing a response.” With this logic implemented, the next time I ask, “When is my coffee chat with Elon Husky?” I’ll get the correct answer because the LLM will now first go into my Google calendar to find that information.
But here’s where it gets tricky. What if my next follow-up question is, “What will the weather be like that day?” The LLM will now fail at answering the query because the path we told the LLM to follow is to always search my Google calendar, which does not have information about the weather. This is a fundamental trait of AI workflows. They can only follow predefined paths set by humans. And if you want to get technical, this path is also called the control logic.
Pushing my example further, what if I added more steps into the workflow by allowing the LLM to access the weather via an API and then, just for fun, use a text-to-audio model to speak the answer? (A synthesized voice says: “The weather forecast for seeing Elon Husky is sunny with a chance of being a good boy.”)
Here’s the thing. No matter how many steps we add, this is still just an AI workflow. Even if there were hundreds or thousands of steps, if a human is the decision maker, there is no AI agent involvement.
Pro tip: retrieval augmented generation, or RAG, is a fancy term that’s thrown around a lot. In simple terms, RAG is a process that helps AI models look things up before they answer, like accessing my calendar or the weather service. Essentially, RAG is just a type of AI workflow.
By the way, I have a free AI toolkit that cuts through the noise and helps you master essential AI tools and workflows. I’ll leave a link to that down below.
Here’s a real world example. Following Helena Liu’s amazing tutorial, I created a simple AI workflow using Make.com. Here you can see that first I’m using Google Sheets to compile links to news articles. Second, I’m using Perplexity to summarize those news articles. Then, using Claude and a prompt that I wrote, I’m asking Claude to draft a LinkedIn and Instagram post. Finally, I can schedule this to run automatically every day at 8 a.m.
As you can see, this is an AI workflow because it follows a predefined path set by me. Step one, you do this. Step two, you do this. Step three, you do this. And finally, remember to run daily at 8 am.
One last thing: if I test this workflow and I don’t like the final output of the LinkedIn post—for example, if it’s not funny enough and I’m naturally hilarious, right?—I’d have to manually go back and rewrite the prompt for Claude. This trial and error iteration is currently being done by me, a human. So keep that in mind moving forward.
All right, level three: AI agents.
Continuing the Make.com example, let’s break down what I’ve been doing so far as the human decision maker. With the goal of creating social media posts based off of news articles, I need to do two things. First, reason or think about the best approach. I need to first compile the news articles, then summarize them, then write the final posts. Second, take action using tools. I need to find and link to those news articles in Google Sheets, use Perplexity for real-time summarization, and then Claude for copywriting.
So, and this is the most important sentence in this entire article: the one massive change that has to happen in order for this AI workflow to become an AI agent is for me, the human decision maker, to be replaced by an LLM.
In other words:
- The AI agent must reason. “What’s the most efficient way to compile these news articles? Should I copy and paste each article into a word document? No, it’s probably easier to compile links to those articles and then use another tool to fetch the data. Yes, that makes more sense.”
- The AI agent must act, aka do things via tools. “Should I use Microsoft Word to compile links? No. Inserting links directly into rows is way more efficient. What about Excel? Hmm. So the user has already connected their Google account with Make.com. So Google Sheets is a better option.”
Pro tip: because of this, the most common configuration for AI agents is the ReAct framework. All AI agents must Reason and Act. So ReAct. Sounds simple once we break it down, right?
A third key trait of AI agents is their ability to iterate. Remember when I had to manually rewrite the prompt to make the LinkedIn post funnier? I, the human, probably need to repeat this iterative process a few times to get something I’m happy with, right? An AI agent will be able to do the same thing autonomously.
In our example, the AI agent would autonomously add in another LLM to critique its own output. “Okay, I’ve drafted V1 of a LinkedIn post. How do I make sure it’s good? Oh, I know. I’ll add another step where an LLM will critique the post based on LinkedIn best practices. And let’s repeat this until the best practices criteria are all met.” And after a few cycles of that, we have the final output.
That was a hypothetical example. So let’s move on to a real world AI agent example. Andrew Ng is a preeminent figure in AI and he created a demo website that illustrates how an AI agent works. I’ll link the full video down below, but when I search for a keyword like “skier,” the AI vision agent in the background is first reasoning what a skier looks like (“a person on skis going really fast in snow, for example, right? I’m not sure”) and then it’s acting by looking at clips in video footage, trying to identify what it thinks a skier is, indexing that clip, and then returning that clip to us.
Although this might not feel impressive, remember that an AI agent did all that instead of a human reviewing the footage beforehand, manually identifying the skier, and adding tags like “skier, mountain, ski, snow.” The programming is obviously a lot more technical and complicated than what we see in the front end, but that’s the point of this demo, right? The average user like myself wants a simple app that just works without me having to understand what’s going on in the back end.
Speaking of examples, I’m also building my very own basic AI agent using [N/A]. So, let me know in the comments what type of AI agent you’d like me to make a tutorial on next.
To wrap up, here’s a simplified visualization of the three levels we covered today.
- Level one: We provide an input and the LLM responds with an output. Easy.
- Level two, AI workflows: We provide an input and tell the LLM to follow a predefined path that may involve retrieving information from external tools. The key trait here is that the human programs a path for the LLM to follow.
- Level three, AI agents: The AI agent receives a goal and the LLM performs reasoning to determine how best to achieve the goal, takes action using tools to produce an interim result, observes that interim result, and decides whether iterations are required, and produces a final output that achieves the initial goal. The key trait here is that the LLM is a decision maker in the workflow.
If you found this helpful, you might want to learn how to build a prompts database in Notion. See you on the next video. In the meantime, have a great one.