Navigating AI Frontier - Before You Blame AI Hallucination… Check Your RAG!

By Shammy Narayanan

An over-enthusiastic husband who suddenly decided to “surprise” his spouse with a homemade pizza? Yes, the same species that believes YouTube + confidence = MasterChef. So, our hero rolls up his sleeves, kneads the dough, and then… hits a roadblock: How on earth do you make the toppings stick? Like any modern-day culinary genius, he consults his kitchen oracle, GPT. And voilà, the AI responds with divine wisdom: “Use glue.” Yes, Fevicol-on-pizza level brilliance.

Luckily, a tiny flicker of common sense prevented him from creating a biohazard in the name of romance. Most people would laugh this off as a classic “AI hallucination,” but… spoiler alert, it wasn’t. This disaster was sponsored by a poorly implemented RAG system. The model scraped a Reddit joke (that obviously got upvotes for comedy), treated it as gospel truth, and served it hot. Welcome to the world where RAG can either elevate AI or make you eat glue-flavored pizza.

Retrieval-Augmented Generation (RAG) is that secret sauce in modern AI the ingredient that fetches the right information from your internal knowledge sources so the model doesn’t respond like it’s living under a rock. Unlike the old-school keyword search, which only spits out what you literally typed, RAG uses vector search to understand what you meant to ask (or at least makes a decent attempt at it).

No doubt, it’s miles ahead of the primitive “find text, show text” era but RAG is far from a plug-and-play miracle. It demands thoughtful design. It’s definitely not a “one-size-fits-all” formula. A standalone FAQ, where every Q&A lives in its own little bubble, cannot be treated the same as a slow-burning mystery novel, where a subtle clue dropped in Chapter 1 unlocks the revelation in Chapter 18.Get the implementation wrong, and you’ll still miss the bull’s-eye not because the algorithm failed, but because the design did.

When designing an RAG system, three parameters can make or break the entire setup. Think of them as the seasoning levels in cooking get them right, and the dish sings; get them wrong, and you’ll wonder why the recipe tasted better on YouTube.

Chunk Size – How much do you cut at a time?
Chunk size refers to how big each piece of text is when you “slice” your document for indexing. Too small, and the model gets snackable fragments with no context; too large, and it gets a buffet plate it can’t digest.

How to decide?

For short, standalone content (e.g., FAQs, product descriptions): smaller chunks (150–300 words) work best.
For context-heavy content (e.g., research papers, novels, policy docs): go larger (400–800 words) to preserve continuity.

Overlap Size – How much do we repeat between chunks?
A small portion of text is intentionally repeated between consecutive chunks. This ensures context doesn’t get chopped mid-sentence like a careless editor snipping scenes from a movie.

How to decide?

If the document has smooth narrative flow or dependency between sections (e.g., legal docs, stories, manuals): use higher overlap (15–25% of chunk size).
If each chunk is mostly independent (e.g., FAQs, blogs, encyclopedic content): keep overlap small (5–10%).

Number of Chunks Retrieved – How many pieces should the model look at before answering?
This controls how much context gets pulled during a query. Fetch too few, and you get a half-baked answer. Fetch too many, and you may confuse or dilute the response (like inviting 20 people to advise you on a life decision utter chaos).

How to decide?

For fact-based or direct answers (support docs, troubleshooting guides): 3–5 chunks usually do the job.
For deep reasoning, storytelling, or multi-hop answers (research, legal, academic content): 5–10 chunks provide richer grounding.

By now, it’s clear that RAG isn’t a copy-paste feature its success hinges entirely on the type of content and the people querying it. And in the real world, documentation is rarely a neatly labelled box of FAQs. It’s a chaotic buffet of policies, guides, narratives, tables, research, and tribal knowledge. So before an AI engineer jumps into implementing RAG, they must first understand what kind of documents they’re dealing with and how users will search for information. Only then can you craft a RAG that’s truly effective.

Of course, there’s a growing school of thought boldly claiming: “RAG is dying LLM context windows are expanding, just dump all your data into the prompt!” Sounds tempting, right? If only life (and architecture) were that simple. Is this really the end of RAG?

A spicy question and we’ll unpack the architectural reality behind this claim in the next part of this series. Until then, stay tuned.

Shammy Narayanan is the Vice President of Platform, Data, and AI at Welldoc. Holding 11 cloud certifications, he combines deep technical expertise with a strong passion for artificial intelligence. With over two decades of experience, he focuses on helping organizations navigate the evolving AI landscape. He can be reached at shammy45@gmail.com