
Knowledge Graphs, Context Graphs, and Ontologies: When Grounding Your AI Needs More Than Search
Vector RAG quietly fails on questions about how things connect. What knowledge graphs, context graphs, and ontologies actually are – and how to decide what your use case really needs, without over-building.
You shipped a retrieval-augmented generation (RAG) assistant, and it works. People use it. Ask it for the parental leave policy or a summary of a contract clause, and it answers well. But there's a quieter problem underneath: when someone asks a question that depends on how things in your organisation connect to each other, the assistant gives a fluent, confident, and shallow answer – and nobody notices.
This piece is about that gap. You'll learn why vector-based RAG quietly underperforms on a large class of real business questions, what knowledge graphs, ontologies, and context graphs actually are (they are three different things, and the market blurs them), and how to decide what your next use case genuinely needs – without over-engineering an enterprise ontology or dismissing the idea on reputation. The organising idea is simple: similarity, relationships, and meaning are not the same thing, and each needs a different tool.
We've built and grounded enough of these systems to say this plainly: the decision is rarely "graphs, yes or no". It's "for this specific question, what does the answer actually require?" That distinction is the whole article.
The quiet failure of "good enough" RAG
Vector RAG handles document lookup well. That's because it works by similarity: it converts your question into a numerical representation, finds text in your knowledge base whose representation is closest, and hands those passages to the model. "Find me the passage about X" is exactly what that mechanism is for.
The trouble starts with a different shape of question. Some questions are relationship-dependent: which customers are affected if we retire this product? What systems depend on this database? Others are rule-dependent: was this exception handled the way policy requires? Who is allowed to approve this? Why was a similar request declined last quarter? The right answer to these is assembled from several documents, joined by a relationship that no single document ever states.
Vector RAG does not error out on these. It returns a plausible, fluent, confidently worded answer that is shallow or wrong. That is more dangerous than a visible failure, because there is nothing to catch.
The business cost scales with what the answer is used for. A shallow document-lookup answer costs a few minutes. A shallow dependency or precedent answer – acted on by a person, or worse, by an agent – costs a wrong decision: an unflagged customer, a missed downstream impact, an approval that breached policy. And the cost rarely shows up in your usage metrics. It surfaces as eroded trust. We see this pattern repeatedly in client work: teams quietly stop asking the assistant the hard questions and route them back to humans, so the AI delivers on the cheap questions and silently fails on the expensive ones. By the time it shows up, it looks like low adoption – when the real problem was architectural all along.
One public sector client we worked with had a RAG assistant that handled policy questions well – staff trusted it to answer "what does the regulation say about this". But when someone asked which downstream services a proposed eligibility change would affect, it returned a confident, tidy list that missed two of the affected programmes. Nobody noticed until a programme lead spotted that their own area wasn't on it. The assistant hadn't failed loudly; it had quietly stopped being trustworthy for exactly the questions that carried the most risk.
Here is the part that matters most for your architecture decisions: a bigger context window, better embeddings, or a stronger model will not fix this. It is not a tuning problem. Similarity search structurally cannot follow a chain of connections, because two facts can both be essential to an answer while sitting in documents that are not textually similar to the question or to each other.
That structural limit is what knowledge graphs, ontologies, and context graphs address. They are not the same thing, so it's worth being precise.
Three things people call "the same thing" – and shouldn't
The market uses these three terms loosely. Keeping them distinct is half the value of understanding them at all – and in our experience, the conversations that go wrong usually go wrong here, before a single line of architecture is drawn.
Knowledge graph
A knowledge graph is a persistent, structured store of the entities your organisation cares about – customers, systems, contracts, people – and the relationships between them. "System A depends on Database B" is not a sentence buried in a document; it is a piece of queryable data. In a knowledge graph, relationships are first-class facts you can follow. It is, in plain terms, a store of what is connected to what.
Ontology
An ontology is the layer above the graph. It defines what the entities and relationships mean – the formal definitions, rules, and constraints a machine can reason over.
A graph records that "an Order is linked to a Customer". The ontology says what an Order is: that it must have exactly one Customer, that "high-value" means above a particular threshold, that a high-value order requires manager approval. A graph stores the relationships; an ontology defines what they mean and what rules apply. A graph without an ontology is connected data with no agreed meaning. An ontology without a graph is rules with nothing to apply them to.
Context graph
A context graph is neither a store nor a rule set. It is a runtime artefact: the task-shaped, connected slice of information assembled for a single AI call to do one specific job right now. If the knowledge graph is the whole map, the context graph is the route for one trip. It is drawn from the knowledge graph, shaped by the ontology's rules and permissions, used, and then discarded or refreshed.
"Context graph" is a less standardised term than the other two. We use it deliberately and precisely – not because there's a single industry authority to cite, but because the runtime layer needs a name, and in our agent work it's the layer that does the most quiet heavy lifting.
How they layer together
Picture an agent approving a purchase order. The knowledge graph holds the persistent facts: this Order, its Customer, the customer's region, the regional manager, prior orders, related support tickets – all as entities and relationships. The ontology holds the meaning and rules: an order over the threshold is "high-value"; a high-value order needs approval from a manager in the customer's region; an agent without an approval role cannot execute that approval. The context graph is what the agent actually assembles for this one task: just this order, its value, this customer, this region, this manager, the one rule that applies, and the agent's own permission. It reasons over that slice – not over the entire organisational graph.
The questions vector RAG quietly gets wrong
It helps to recognise the specific question types where similarity search falls down. Three come up repeatedly.
Multi-hop dependency. "If we decommission this database, what breaks?" Answering this means following a chain – database to systems, systems to services, services to teams – and each link typically lives in a different document. Similarity search retrieves the first hop and stops, because it has no mechanism to follow a connection.
Precedent and rule-dependent. "Can this exception be approved, and by whom?" The answer depends on policy and on prior decisions. The relevant policy clause and the precedent case may share almost no vocabulary with the question, so similarity simply doesn't surface them.
Impact and blast radius. "Which customers are affected if we change this pricing tier?" This needs traversal – pricing tier to contracts to customers. Vector RAG instead returns text about pricing tiers and produces a plausible, confident, incomplete answer.
In every case the failure is silent. There's no error, just a fluent answer that looks complete. That is exactly why the practical first step we recommend later is to audit your assistant's answers to relationship and rule questions specifically – it's the cheapest, most honest diagnosis you can run.
GraphRAG: the pragmatic middle ground
The good news is that you don't have to choose between vector search and a graph. GraphRAG is the hybrid most teams land on – and it's where most of our own builds end up too.
GraphRAG is a technique, not a product. It combines two retrieval strategies: vector search for "find text about X", and graph traversal for "follow the relationships from X". At query time you use one or both, then hand the combined, structured context to the model. The term was popularised by specific research and vendor work, but the pattern itself is general.
Teams land here for two practical reasons. First, real workloads are mixed – a single session contains simple lookups and relationship questions. Pure vector RAG fails the relationship ones; a pure graph is overkill for the lookups; a hybrid handles both. Second, it is incremental. A team with working vector RAG can add a graph alongside it and route relationship-heavy queries to traversal. That is far easier than rebuilding everything on a graph, and in practice it's the path that survives a budget conversation.
A brief practitioner aside on tooling, since clients always ask: when we build the graph layer for these systems, our enterprise-grade default is Neo4j, and on Azure-centric client stacks we use Azure Cosmos DB for Apache Gremlin. That's a fit-to-the-existing-stack decision, not a verdict on which is "best" – the architecture choices in this piece matter far more than the database badge.
Be clear-eyed about the costs. You need a knowledge graph, which means extracting entities and relationships from messy source data – LLMs make this cheaper, but the extraction is noisy and needs validation. A graph that goes stale will confidently return wrong relationships, so maintenance is a real, ongoing cost. A hybrid is slower per query, because it adds retrieval calls and a routing step. And that routing step is itself an LLM decision that can misclassify a query. GraphRAG is a sound and increasingly adopted technique with fast-maturing tooling – but it is not yet as turnkey as plain vector RAG, and you should plan for that.
From assistants to agents: context assembly is the real bottleneck
The decision sharpens considerably once you move from assistants to agents.
With an assistant, the question is "what document is relevant?" With an agent – something that plans, acts, and iterates – the question becomes "what does this agent need to know, right now, to take this action safely?" That is no longer a document-relevance problem.
As models have become strong reasoners, the binding constraint has moved. In our assessment – and this matches what we see across agent builds – the model is usually capable enough; agent failures more often come from the wrong context: too much, too little, stale, or missing the one constraint that should have stopped an action. Context assembly, not model reasoning, is where most agent failures originate.
This is the context graph's job. For each task it assembles the task-shaped subgraph – the relevant entities, the relationships, the applicable rules, the agent's permissions. Done well, it prevents two opposite failures: dumping the whole knowledge graph into the context window, which is expensive, slow, and drowns the model; and retrieving too narrowly and missing a dependency.
The ontology is what makes this safe to act on. It enables three things for agents. Action constraints: you encode that a high-value approval needs a manager role, or that an action is forbidden under certain conditions, and the agent checks the rule before acting rather than relying on a well-worded prompt. Permissions: access is enforced at the level of individual entities and relationships, so the context graph only ever contains what the agent is allowed to see and act on. Institutional memory: prior decisions and their justifications become queryable, connected data, so the agent can ask "has a similar exception been approved before, and why?"
These are enabled capabilities, not free ones – they depend on a maintained ontology and graph. They are demonstrated today in narrow, scoped deployments. Fully autonomous, enterprise-wide agent reasoning over a rich ontology is still emerging; we'd treat the broad version as aspirational and the scoped, human-in-the-loop version as what actually works now. That's the line we hold with clients, and it's a more honest one than most vendor demos will give you.
One piece of plumbing is worth a sentence. The Model Context Protocol (MCP) is an open standard for connecting AI applications to external tools and data. It is how an agent reaches a knowledge graph, an ontology service, or a rules engine – useful plumbing, but not the substance of the decision.
Why this layer is a governance asset
There is a governance point here that data leaders should not miss: agentic AI makes the graph and ontology layer more important, not less.
Document RAG governs access at the document or index level, and that is fine for an assistant retrieving a chunk – a low-stakes read. An agent that can traverse "customer" to "payment record" to "issue refund" is not low-stakes. It needs governance at the level of the node, the edge, and the action. The ontology is the right place for that, because it already defines what those objects and actions are.
The graph layer also helps with two things document RAG handles poorly. It can carry source and lineage on the nodes and edges themselves, so provenance travels with the context. And a graph traversal is far more inspectable than a similarity score – you can show exactly which relationships led to an answer or an action.
The sharp version of the point: an autonomous agent traversing a well-connected graph can reach data and trigger actions no individual human would have stumbled across. The connectivity that makes the graph useful is the same connectivity that creates exposure. Good practice – and what we advise clients to build in from the start, not retrofit – is to model permissions into the ontology rather than bolt them onto the application edge, carry lineage on the graph, and place a human-in-the-loop checkpoint on consequential actions, not on every read. Be honest that graph-level security is genuinely harder than document-level security, because hidden nodes can still influence which paths are visible.
And the accountability is yours regardless. Under New Zealand's Privacy Act 2020 and Australia's Privacy Act 1988, responsibility for personal information sits with the organisation whether a human or an agent did the accessing (as at mid-2026). "The agent reached it on its own" is not a defence. It's worth noting that this is a moving target: in Australia, new transparency obligations for automated decision-making that significantly affects individuals commence on 10 December 2026 – so for any agent making or shaping consequential decisions, the explainability the graph layer gives you is becoming a compliance asset, not just good hygiene.
The honest part: cost, failure modes, and "you may not need a graph database"
Knowledge graphs, and especially ontologies, are expensive in a way that is easy to under-budget – because the cost is back-loaded. This is the part of the conversation we make sure happens before a client commits, not after.
Build cost is visible and gets planned for. Maintenance cost is what kills projects. An ontology is a living model of a business, and the business keeps changing – new products, reorganisations, renamed systems, revised rules. An unmaintained model drifts from reality and becomes misleading, which is worse than having no model at all. Ontology engineering is also a genuine specialist skill, and a scarce one – and it's noticeably thinner in the NZ and AU market than in larger talent pools offshore. That scarcity sharpens the risk: building something only one contractor or internal expert can maintain creates a single point of failure, and locally that single point is harder to replace than the project plan assumes.
If this sounds familiar, it should. This is not a new idea. The semantic-web and enterprise-ontology era promised machine-readable meaning across the organisation and largely collapsed under its own modelling burden – too much ontology built ahead of any concrete use case, colliding with messy real data that never matched the clean model.
So what has actually changed? Two things. LLMs are genuinely better at the relationship- and reasoning-shaped questions this layer serves, so the layer is useful in a way it often was not before. And LLMs make entity and relationship extraction from messy text dramatically cheaper, so a graph is cheaper to populate. What has not changed: the maintenance burden, the temptation to over-model, the gap between a clean model and dirty data, and the discipline it takes to scope tightly. LLMs make the graph cheaper to build. Nothing makes it cheaper to keep true – and in the field, "keeping it true" is where these projects live or die.
Needing relationships does not mean needing a graph database
Here is the counterpoint that saves the most money, and the one we find ourselves making most often in early architecture conversations. "Relationships matter" does not lead automatically to "build a graph database". Three distinct needs hide under that phrase.
Existence and traversal – "does A connect to C, and by what path?", where the depth is unknown or variable. This is what a graph database is genuinely good at: supply chains, dependency maps, networks, asset hierarchies.
Semantics and rules – "what does this relationship mean, and what rule applies?" This is a logic and reasoning problem. A graph database does not solve it for you. For business-rule enforcement, a rules engine is often the right tool – and many enterprises already own one. We saw this clearly with an insurer who came to us convinced they needed a knowledge graph: when we sorted their failing questions, almost all of them were rule-application questions – "is this claim within policy", "who is authorised to sign this off" – not traversal. The rules already lived in a business rules system they owned. The fix was exposing that to the AI, not standing up a graph database, and it saved them months of modelling. For formal inference over facts, a logic-programming approach such as Datalog is lighter and more deployable than a full ontology stack.
Definitional consistency – "which definition of revenue is authoritative?" This is a metadata and glossary problem, best solved with a good business-glossary layer.
A graph database earns its place specifically when unknown-depth traversal is the core need – and even then, often as part of a GraphRAG hybrid rather than a replacement for vector search. Reaching for a graph database by default is simply the modern version of the over-engineering mistake – same trap, new decade.
A decision framework – and a first move
You don't decide this at the enterprise level. You decide it per use case. The wrong question is "should we adopt knowledge graphs?" The right one is "for this use case, what does the question actually require?"
Five signals will tell you most of what you need:
What shape are the failing questions? Document-lookup failures point to retrieval quality – stay with plain RAG and tune it. Relationship- and rule-shaped failures will not be fixed by tuning; you need structure.
Is it traversal or rules? "Trace the connections, find the path" points to a graph, often GraphRAG. "Apply the policy correctly" points to a rules engine or logic layer – likely something you already own.
Is it really a definitions problem? If teams disagree about what terms mean, you need a metadata or glossary layer, not a graph.
How much is the answer trusted to drive action? Low-stakes assistance tolerates "okay". An agent taking actions, or a high-cost decision, raises the bar – and that is where structure and provenance earn their keep.
Is there a real, bounded use case now? If the honest answer is "not yet, but it'd be good infrastructure", stop. Building ahead of the use case is the documented failure pattern – and the one we've watched sink the most well-intentioned projects.
The output of this is usually a sequence, not a single pick: tune plain RAG, add a rules layer, build a GraphRAG hybrid, stand up a metadata-glossary layer, or model a tightly scoped ontology. Your right first project is one bounded, high-value use case where relationship or rule questions are clearly failing today – never an enterprise-wide ontology. Prove the layer earns its keep with a real before-and-after, then expand only where the next use case justifies it. The teams we see succeed with this are, without exception, the ones that scoped narrow and earned the next step.
If you want a concrete starting point that needs no procurement and no budget: audit your current AI's failures. Over a week or two, log every question where the assistant gave a plausible-but-shallow answer, and sort each one into three buckets – document lookup, relationship or traversal, or rule and precedent. That sort is the diagnosis. If the failures cluster in document lookup, you have a RAG tuning job. If they cluster in relationships and rules, you have an evidence-based, specific case for a graph, a rules layer, or a hybrid – tied to real questions your organisation is asking, not to a slide.
That is the conversation worth having with your team: not "should we buy a knowledge graph", but "here are the questions our AI is quietly getting wrong, and here is what each one actually needs".
If that diagnosis does point toward a graph, it's the work we do – see how DataSing designs and grounds knowledge graphs.
Written by
DataSing Team
AI & Data Specialists