Why does RAG still feel clunky in 2025?

Let’s just be honest for a second.
Everyone’s saying “RAG is the future.”
But… have you really tried building one that doesn’t fall apart on contact?


Most of what we call “RAG” today is still a fragile dance of glue code and faith:

  • One bad chunk split? Bye-bye relevance.
  • Vector DB latency? Now your agent sounds drunker than me.
  • Grounded answers? Sure, until someone asks “why” twice.

And if you’ve ever tried to scale this beyond a toy demo, you’ve probably hit one of these walls:

  1. Semantic mismatch – the model sounds fluent but isn’t actually reading the context right.
  2. Retriever overconfidence – grabbing something that feels close but is totally off.
  3. Unnatural prompt stitching – stuffing retrieved docs into the prompt like it’s a sandwich nobody ordered.

All of this gets worse when people assume “just add more tokens” will fix things.
Spoiler: it doesn’t. It just makes the model pretend better.


There’s also an elephant in the room:
The current generation of LLMs was never built with retrieval in mind.
We’re still trying to retrofit memory into an architecture that was trained to forget.


So… yeah. RAG sounds great. In practice, it’s still rough.
Maybe we should talk more openly about that.

Curious how others are navigating this.
Has anyone found setups that actually feel smooth and scalable?

2 Likes

Yep — totally agree with this framing.

At some point, I started realizing that even when retrieval is logically sound, the generation still slips. It’s like you’re building on factual memory, but the semantic scaffolding isn’t quite there — so the output ends up coherent on the surface, but not structurally grounded.

We’ve been playing with different ways to observe these tension points — especially when answers feel “right” but originate from a semantically shifted zone. Still just scratching the surface, but I love seeing others digging into the root architecture too, not just the patches.

Really appreciate this thread — super clarifying.

1 Like

Hello @Pimpcat-AU Im interested with you approach!, can you share a fragment of code in python making your indexed lookup???

I was about to publish a topic for this trend :sweat_smile:

Because im working in a little project thats make a search in a folder with some pdf files, and I have mi rag but I need to make more accurate mi lookup

2 Likes

import os
from collections import defaultdict

Build inverted index from folder of text files

def build_index(folder_path):
index = defaultdict(set) # word → set of filenames
for filename in os.listdir(folder_path):
if not filename.endswith(‘.txt’):
continue
filepath = os.path.join(folder_path, filename)
with open(filepath, ‘r’, encoding=‘utf-8’) as f:
text = f.read().lower()
words = set(text.split())
for word in words:
index[word].add(filename)
return index

Search query in index

def search_index(index, query):
query_words = query.lower().split()
results = None
for word in query_words:
if word in index:
if results is None:
results = index[word].copy()
else:
results &= index[word]
else:
return set() # word not found
return results or set()

Example usage

folder = ‘/path/to/text/files’
index = build_index(folder)

query = “your search terms here”
matched_files = search_index(index, query)

print(f"Files matching ‘{query}’:")
for f in matched_files:
print(f)

6 Likes

I forgot to mention chunking the data also speeds up indexing speed.

3 Likes

Thank you @Pimpcat-AU ! I will try your code for understand it, thank youuu :ok_hand:

3 Likes

Take a look here:
K3D - The new paradigm for Knowledge.

Take a look at my post, I think you’ll find it interesting…

1 Like

Can you link it please?

2 Likes

Sure, if my post is allowed:
K3D

1 Like

Hey, I took a look through your K3D/3D Knowledge repo. You’ve clearly put in a lot of work pulling together ideas from AI, 3D vector data, and spatial web tech. The range of research is solid, and your documentation covers a lot of ground. You’re tackling some real challenges in how AI can work with spatial and vector knowledge, especially with agents and immersive systems.

My main thought is that your work is at the cutting edge of what people are trying to do with 3D knowledge representation. It’s early days for this field, so the conceptual work and roadmaps make sense. I noticed things are still mostly in the research and planning stage, but that’s how these big shifts always start.

Just to share, I’m currently working on an image-based memory architecture that will be deployed in either my 3rd or 4th generation bots. I’m still in the process of finishing up my Gen 2 bots, so I haven’t had the time to fully complete and implement the new memory system yet. I know how to do it, there just isn’t enough hours in the day.

Overall, this is a good foundation. I recommend keeping your architecture modular so you can adapt if something better comes along. If you want to discuss deterministic storage or alternative memory systems, I’m happy to talk more.

Keep going, you’re on the right track.

4 Likes

I succeeded :slight_smile: oh wow, didn’t realize it took me 2 months… geez the days are a blur.

1 Like

You’re not wrong — most RAG systems fail because they treat retrieval as a feature instead of a contract.

What breaks in practice isn’t the vector DB or the LLM; it’s the absence of hard guarantees between stages:

  • Retrieval returns something, not the right thing

  • Chunking optimizes for embeddings, not semantics

  • Prompt stitching assumes relevance instead of proving it

  • And the model is forced to sound confident even when context is weak

In other words, most RAG pipelines optimize for fluency, not correctness.

What’s worked better for me is flipping the mental model:

  • Retrieval must be verifiable, not just similar

  • Context must be bounded, not “more tokens”

  • The system must be allowed to refuse to answer when evidence is insufficient

  • And success should be delayed until inputs are proven durable and relevant

Once you introduce refusal states, confidence thresholds, and explicit invariants (“this answer is grounded because X, Y, Z were verified”), RAG stops feeling fragile — but it also stops feeling magical.

I agree with your last point especially: current LLMs weren’t trained with retrieval in mind. Retrofitting memory without governance just amplifies hallucinations.

RAG isn’t dead — but unguarded RAG absolutely is.

Curious whether others have experimented with:

  • retrieval confidence scoring

  • refusal-first agents

  • or separating “explanation” from “answer” phases

That’s where things started to feel scalable for me.

1 Like