
RAG is dead! Long live RAG!
Many tasks follow a similar pattern: you receive a request and then gather the right supporting information (”context”) to answer it. You often have to go search for the relevant context in documents, databases, spreadsheets, audio and video files, or even on the web. This process of finding and assembling the right information is what makes or breaks how well an AI agent (or a human) performs.
That’s why the 2020-2024 version of RAG is dead. Retrieval today isn’t just embedding and fetching text chunks. It’s evolved into a comprehensive search system that spans multiple data types and sources. Modern retrieval means giving agents the right context, wherever it lives, so they can reason effectively. It’s the backbone of true context engineering.
What are we searching for?
There is a lot of nuance in finding exactly the right information for the task at hand. This is best illustrated with a simple example: let’s assume we are an IT provider and a customer is inquiring about an office infrastructure upgrade with new hardware, network equipment, and managed support bundled under their existing contract.
To put a competitive offer together, we need to gather a lot of information:
- First we find the requested items in our product catalog. This requires very precise retrieval over very large datasets with product images, descriptions, and many other product details. If the customer asks for Dell UltraSharp U2723QE 27-inch 4K IPS monitors and we respond with a quote for the U2720Q model instead, we might lose the deal!
- We also need to retrieve the customer’s contract documents to check if there are any terms like discounts or delivery times we need to take into account. This also requires exact retrieval but this time over documents instead of rows in tables.
- If there were any sales calls with the customer, we want to comb through those and surface anything relevant for this deal.
- It is likely the customer is shopping around and inquiring with our competitors as well. To make sure our offer is competitive, we run a web search to check the prices of the requested items.
- Good references always help! We search for relevant projects we’ve done successfully in the past. While there might not be a precise match, the most important thing is that the type of project is similar enough to give the customer confidence we can handle theirs.
Once we retrieved all relevant supporting information, the agent or workflow components can reason over them, and put the offer together.
It should be clear by now that even for a fairly standard customer request, we need to combine very different types of search: from high precision retrieval over large datasets to web search, from searching through audio recordings to scraping web pages. We need a purpose-built search engine to automate all of this.
Who's in charge of the search system?
The agentic search engine combines a lot of functionality: from browsing the web to full-text or semantic search across any data type. The order in which the different types of search are executed is also crucial: we first need to know which items in our catalog are requested before we can check their price online.
That’s why the agentic search engine is orchestrated by a dedicated sub-agent. This agent decides what to search for and how; it infers which queries can be run in parallel and which ones are sequential. The sub-agent also comes with validation, feedback and correction mechanisms to ensure quality results.
In a nutshell, the task of the search sub-agent is to compile all required information for the main agent to do its job.

Why don’t we give all our data directly to the LLM?
The context window size of LLMs (the amount of tokens you can feed the LLM) has grown drastically over the last few years. Google’s Gemini models for instance can now consume up to 1 million tokens. This begs the question: why don’t we just pass all our data directly to the LLM and let the agent figure things out? There are various reasons why this might be a bad idea:
- 1 million tokens isn’t nearly enough for many use cases. Let’s say there are 100,000 items in our product catalog. Even if each item only has a name, a brief one sentence description, and a product image, that results in over 50 million tokens…
- It’s very expensive. Retrieving the right item from our catalog might cost us 1,000 tokens. Even if all the products fit within the 1 million token limit, it would cost up to 1,000x more to achieve the same goal! And this gap only widens as the dataset size increases. Caching content is very hard due to regular updates to the catalog.
- Dumping all the data in the model’s context window hurts performance, a phenomenon called “context bloat”. As the size of the context becomes very large, the model can get distracted by irrelevant, or even worse, contradicting information.
Search — the hidden power behind AI automation
A robust agentic search system is the key to unlock many enterprise workflows. It’s an easily overlooked but critical piece of functionality when looking for AI automation solutions. Without it, you can’t even get the right information to do the task.
But beware, the hard part isn’t having a readily available RAG drag & drop component in your favourite workflow tool. It’s combining the different types of search in a holistic search system operating at scale, across any data type, and orchestrated effectively.
Ready to Deploy a Production-Ready Search System?
Focus on the job you want to automate. Let us worry about finding all the relevant information to get that job done.
Legal
Privacy policyGet the latest product news and updates.