Patrick Van Deven & Hans Hultgren
Patrick Van Deven is CEO of VaultSpeed, which builds the agentic framework for enterprise data vault automation: AI agents that handle discovery, code generation, and deployment while keeping humans in control of the decisions that matter. Hans Hultgren is the founder of Genesee Academy, the leading international training organization for Data Vault and enterprise data modeling. This is a conversation between an automation vendor building AI-powered tools and the independent authority training the people who use them.
A note on terminology
Throughout this piece we use two terms from Hans's practice at Genesee Academy. Core Business Concepts (CBCs) are the fundamental business entities an organization cares about: customers, products, orders, contracts. Natural Business Relationships (NBRs) are the connections between those concepts as the business actually experiences them: a customer places an order, a contract covers a product. CBCs and NBRs are deliberately separated from technical or source-system definitions. They represent what the business means, not what the database contains.
Why AI accelerates data vault discovery but stalls at the decision point
Patrick: AI can now read source schemas, scan business documents, ingest meeting transcripts, and generate transformation code. We run these capabilities daily and the results are often genuinely impressive. Discovery agents turned loose on business documentation, technical metadata, and transcribed customer calls identified CBCs and proposed NBRs that held up under expert review. In several cases the agents mapped physical structures to business meaning faster and more completely than manual work would have.
So the capability is undisputed. The problem is the assumption that rides behind it: that AI-driven data vault modeling removes the need for human decision-making. It does not.
We had a customer with three source systems, each containing a "customer" table. The discovery agent found all three, proposed they were related, and correctly identified overlapping attributes. What it could not tell us was whether they represented the same customer. That answer depended on a business decision made years ago by people who had since left the company. One system tracked customers by billing relationship, another by service contract, a third by physical location. Three valid perspectives. One of them had to win for any given downstream use, and the winning answer changed depending on which regulatory report you were building.
The mechanical work of finding those tables, scanning the schemas, proposing the overlap took the agent about twenty minutes. The actual decision about what "customer" means in this organization took two weeks of meetings with people from three departments. And that ratio is roughly what we see everywhere. The discovery is fast. The decisions that follow are exactly as slow as they always were, because they require context, authority, and organizational consensus that no model can provide.
When teams confuse the speed of discovery with the speed of resolution, they get into trouble. The model arrives at the decision point quickly and then, because it has to produce an output, picks an answer. The answer looks confident. It lands in a pipeline. The person downstream has no way of knowing that the choice between billing-customer, service-customer, and location-customer was made by an algorithm that lacked the context to make it. By the time someone notices, the number has already shown up in a report.
The progress is real: what used to take weeks of manual spelunking now takes hours. But arriving at the decision faster is only valuable if you recognize that you have arrived at a decision, and that a person still needs to make it.
Hans: Within the parameters of currently available AI LLM offerings, we have tested AI capabilities with regard to discovering, identifying and describing the CBCs and NBRs inherent to a given business subject area. The process is tested via our Genesee Academy Modeling Assistant and the results have been surprisingly impressive.
A bit of background: with the widespread availability and general focus on automation tooling over the past decade, our focus has been shifting to the more business-facing aspects of modeling in the enterprise data space. In particular the conceptual modeling, business mapping and logical data modeling areas have been a growing emphasis. In our practice, we focus on what we call the Core Business Concepts (CBCs) and Natural Business Relationships (NBRs). Note that this delineation is important in order to differentiate CBCs and NBRs from other meanings frequently associated with concepts and relationships in other modeling approaches.
The process to capture (discover, identify and describe) the CBCs and NBRs relies on interactive sessions with business representatives (ELM Workshops), business interviews, and lastly, review of business documents. Notably absent is a review of source systems or other existing technical documentation. The CBCs and NBRs are then codified in a set of artifacts including definitions and a logical data model (ELM). The resulting model can then be directly mapped to a physical Data Vault model and deployed on a variety of technology stack alternatives.
Now back to the AI-powered Genesee Academy Modeling Assistant: given meeting minutes, workshop transcripts, and other available business documents, AI is able to capture (discover, identify and describe) the CBCs and NBRs. It also provides insight into the meaning of both the concepts and the relationships, additional candidate concepts and relationships that may be relevant, and a set of follow-on questions to consider in relation to the business case.
In our testing, we review and refine this output and then move forward to the creation of the logical data model (ELM).
As mentioned above, the results are surprisingly good. The Genesee Academy Modeling Assistant typically captures all of the CBCs and NBRs, including additional considerations and insights that are valuable to the modelers. There are exceptions of course, however the general results are excellent.
Patrick: What Hans describes matches what we see on the automation side. The discovery is genuinely good. Where it breaks is the step after discovery: the decisions that require context no model has access to.
The limits of AI in enterprise data vault modeling
Hans: While it may be true that you can train AI to do just about anything that you can train a human to do, there appear to be limits, at least for now. These limits seem to appear when there is ambiguity and we need to exercise some judgement. This may be because there are too many variables to resolve or simply that we need to iterate through various scenarios, each requiring interactive communications with multiple resources. Ultimately the humans-in-the-loop will need to address integration decisions, CBC levels and classifications, model reconciliation, and domain boundaries, among others. None of these tasks can be readily resolved through pre-determined decision criteria or analysis of prior scenarios. Partially because each situation is truly unique, and partially because the variables are constantly changing.
This brings up a common military quote: "combat is a highly fluid situation." It notes that combat conditions change rapidly, so you must be flexible, make quick decisions, and adapt to uncertainty. This is increasingly true in your business as well. To accommodate this ever-changing environment, the enterprise data function (including the modeling and mapping of data) needs to be fluid and agile. Because our enterprise is constantly in motion, it requires a form of kinetic modeling: modeling that is hyper-agile, ever responding to change. For the model to live and adapt, it becomes a living, breathing model. So even when we do navigate the unique intricacies of a modeling challenge (integration decisions, CBC levels and classifications, model reconciliation, and domain boundaries) and resolve it by making a decision, we now have to recognize that this is not the end state. The next change is around the corner, maybe next year or maybe tomorrow.
When one ambiguous entity becomes hundreds: AI at scale
Patrick: Hans is describing single decision points. What happens when you have hundreds of them?
Large enterprises carry layers of unresolved disagreement about what their data means, and most of those disagreements are deliberate. Finance defines profitability based on recognized revenue. Operations defines it based on margin after fulfillment costs. Any senior manager knows this. The definitions diverge because the functions they serve diverge, and forcing alignment across those boundaries was never worth the cost. Each department ran its own reports, used its own logic, and produced numbers that were correct within its own context. That worked, as long as you stayed inside the boundaries.
The trouble starts when you try to cross over. A board member asks a question that spans finance and operations. A regulatory report requires a consolidated view. A data product needs to serve users from both sides. Suddenly the definitions that made sense in isolation collide, and someone has to decide which one governs, or how to reconcile them into something that holds up across the boundary.
This is where AI needs clear signposts. When an agent navigates a knowledge graph that spans multiple functional domains, it needs to know where the boundary is, what each side means by the same term, and what the institutional rule is for resolving the conflict. Without those signposts, the agent will pick whichever path looks most probable, which means it will silently apply one function's definition in a context where the other one should have governed.
Earlier I described a single ambiguous concept: three source systems, one "customer" table in each, someone has to decide which definition wins. Now multiply that. When you feed metadata from a data vault with 18,000 objects into a discovery agent, you are not dealing with one ambiguous CBC. You are dealing with hundreds. Five satellites describe "product" with overlapping attribute names but different data types. Seven tables reference "account" across four functional domains. The agent flags every conflict, produces a queue of questions that would take a team months to resolve, and each answer depends on institutional context that lives in people's heads, not in the metadata.
A European bank we work with faces BCBS 239 obligations, which require full lineage of data transformations and proof of how every number in a regulatory report came to be. The data exists in a vault built over years from dozens of source systems. All the physical metadata is there. What is missing are the institutional decisions: what does "customer exposure" mean in the context of this specific regulation, which physical tables produce that number, and who has the authority to say so.
AI can surface every table, every attribute, every potential path through the vault in minutes. It can propose a mapping and rank alternatives by likelihood. But "most likely" is not language anyone will put on a regulatory filing. The person who signs that report needs to know the mapping was chosen by someone with the authority and context to defend it.
The organizations we see getting this right treat AI as the fastest way to find where the boundaries are and what conflicts exist at each crossing point. They use those flagged conflicts to build institutional rules. Clear signposts that tell the system: when you cross from finance into operations, here is how this term translates, and here is who made that decision. The organizations that struggle skip that step. They treat the AI's proposed resolution as the final answer, and the gap only becomes visible when a regulator, an auditor, or a frustrated business user pulls on the thread.
Why template discipline beats AI-generated code in data vault
Hans: The building blocks of Data Vault modeling are well defined and unambiguous. We capture a set of CBCs and NBRs and from that list we create the models via keys (Hubs), relationships (Links) and context (Satellites). These are codified and deployed using sets of pre-defined templates.
Because as modelers and engineers we are inherently creative, there will always be a temptation to customize these templates and create a set of variants. While this may appear inherently beneficial for the specific case at hand, the broader impact may be detrimental to the program. In the same way as a unit test may indicate a false confidence in a newly coded module, it is only through regression testing that we uncover the issues it may cause.
There are two considerations here: the first is that the actual need for a variant is very unlikely (this pattern and related templates should accommodate all scenarios), and the second is that the true cost of the variant is likely much greater than anticipated: complexity, ambiguity, inconsistency, auditability issues, training, documentation, on-boarding, maintenance, and so on.
The case for deterministic templates over language model code generation
Patrick: Hans is making the case for template discipline from the modeling side. We arrived at the same conclusion from the engineering side.
Even for the mechanical work, pure AI code generation turned out to be the wrong answer. We tried it both ways before settling on a hybrid.
Auditability. Our customers in regulated environments are blunt about this. One banking client put it simply: if the error rate is one in a million, and a hallucinated value ends up on a regulatory report, that is a career-ending event for the person who signed it. The concern is not theoretical.
A satellite load, a hub load, a link load: these are repeatable patterns applied thousands of times across an enterprise. The code for each pattern has been written, tested, and audited. We use templated, pre-written code for these structures, and the output is identical every time. There is no drift, no variation, no possibility of a generated value that does not match the template. For regulated customers, that guarantee matters more than any efficiency gain from generating the code fresh.
Cost. The economics point the same direction. Hub, link, and satellite patterns repeat across an enterprise at enormous scale. Spending tokens to regenerate them from scratch each time is pure waste. Templates produce the same result instantly and at zero marginal cost. At 20 CBCs and 35 NBRs, the difference is barely noticeable. At 18,000 objects, the token cost of regeneration becomes a real budget line.
Our engineering team runs the deterministic pipeline on the cheapest model available, because the execution step does not require reasoning. The agent follows predefined skills and applies known templates. The expensive models, the ones that can actually think, are reserved for configuration and assembly: deciding which templates to apply, how to map CBCs and NBRs to physical structures, how to handle the ambiguity in source systems. That split, cheap execution for known patterns and expensive reasoning for genuine decisions, is what makes the economics work at scale.
How AI changes who can build a production data vault
For most of the past twenty years, Data Vault, and any serious integration methodology, required large teams and expensive specialists. The techniques worked and had been proven at scale, but applying them well demanded fifteen-year veterans who had built vaults across multiple enterprise environments. If you did not have access to that expertise, you were taking a serious risk, and most organizations decided the risk was not worth it.
The alternative was to push raw data into a lake and let the business figure out the reconciliation. Every department, every analyst, every dashboard built its own version of the truth from whatever they could access. The warehouse stopped serving as a shared foundation and became, at best, a staging area.
AI changes who can afford to do this properly. The accumulated expertise that used to live exclusively in the heads of senior practitioners can now be captured in agent skills, templates, and a knowledge graph that maps the relationship between CBCs, NBRs, and their physical implementations. We spent years encoding that expertise into a deterministic rule-based system. That worked, but it was expensive to build and expensive to maintain. The current generation of tools captures the same knowledge in a form that is cheaper to maintain and easier to extend.
A team of ten, using these tools, can now produce work that previously required forty people. The ten are not less skilled. They are working with tools that carry institutional knowledge they would otherwise need to build up over years of direct experience. That is what changes the economics of source analysis before the vault, and data products after it: the two transitions that have always stayed manual regardless of how well the vault itself was automated.
The hiring question changes too. You need fewer people who can write transformation code by hand, and more who can evaluate what the system proposes, validate business logic, and make the judgment calls the system flags. That shift opens the door to teams and organizations that could never have attempted this before, because the barrier was always access to a scarce pool of deep specialists, and that barrier is lower now than it has ever been.
What has not changed: human judgment in data vault modeling
Hans: AI, along with automation tools and new modeling approaches, is poised to greatly enhance our enterprise data capabilities, especially the move from the Global 2000 to large organizations and SMEs.
While AI is a major part of this new reality, it will be those who embrace the hybrid approach who truly benefit. Knowing how to use AI, automation and tooling, various modeling techniques, and program management principles, your organization will be poised to gain the maximum benefit.
The tools keep getting better. The need for people who know what to do with them does not go away. AI surfaces the conflicts faster than any manual process could. It does not resolve them. That is still a human job. If anything, the speed of discovery makes the quality of judgment more important, not less. The faster AI brings you to a decision point, the more consequential it is to make the right call when you get there.
© June 2026 Patrick Van Deven & Hans Hultgren. All rights reserved. No part of this article may be reproduced, distributed, or transmitted in any form or by any means without the prior written permission of the authors. If cited or referenced, full attribution is required: "Patrick Van Deven and Hans Hultgren, The Human Decisions AI Cannot Make".

