Data Vault: the best fit for automation

The pattern-based design of Data Vault 2.0 greatly enhances the automation of the enterprise data warehouse, lakehouse, or mesh.

What is Data Vault 2.0?

Data Vault 2.0 is a data modeling method that offers a flexible, scalable, and agile approach to organizing and storing data in any data warehouse, lakehouse, or mesh.

Introduced by Dan Linstedt in the 1990s, Data Vault has gained popularity since.

It is particularly well-suited for the automation of data integration while accommodating changes in source data structures over time.

What is Data Vault modeling?

Data Vault modeling breaks down all incoming data into three simple standard components, forming a model that is engineered to connect all the dots:

  • Hubs: Represent business entities (e.g., product or customer) and serve as the central point for connecting relationships.

  • Satellites: Contain descriptive attributes about the entities stored in hub tables, capturing changes over time.

  • Links: Capture relationships between entities and enable the modeling of complex business scenarios.

A hub comprises a unique business key for identification, a hash key to support parallel loading, a load date for technical historization, and a record source for debugging. The business can vary; for instance, an employee can be identified by an employee number, and a car by a vehicle identification number (VIN). Multi-part business keys, utilizing multiple columns, are common.

Links and satellites follow a similar structure but with variations. For links, it involves implementing business key relationships, and for satellites, it's the structure of descriptive data. Despite small differences, a clear pattern exists in these entities. The loading procedures exhibit similar patterns; all hub loading procedures, for instance, share similarities.

What is Data Vault architecture?

The Data Vault standard comprises architectural guidelines for the structure of a data warehouse, lakehouse or mesh which VaultSpeed entirely follows:

DV architecture

The integration and storage area is crucial, absorbing changes and additions of sources and serving as a backward-compatible layer for incoming requests on subsequent data consumption layers. It consists of 3 layers:

The landing zone is where all source data initially enters the data platform. The data maintains its source format and model.

The Raw Data Vault contains raw, historical, unfiltered data from the sources. The raw data describes the facts of the source system. They prove that something exists or has occurred.

The Business Data Vault harmonizes business keys/terms from the source system with the anticipated model, ensuring alignment and compliance. It is also the layer where additional business logic is implemented.

VaultSpeed ensures no disruption in ingestion, transformation, and modeling by delivering automated code adhering to Data Vault standards more than any other automation tool, earning the first Data Vault certification.