Designing a Data Platform for Continuous M&A at Thomson Reuters

Industry:

Legal, Tax & Regulatory Technology

Use Case:

M&A Data Integration and EDW Migration

Platform:

Snowflake on AWS

Technologies:

VaultSpeed, Snowflake, Data Vault 2.0

100+

data sources

11

data products live

Thomson Reuters is a global information provider serving the legal, tax, and policy sectors, with approximately $7.5 billion in revenue and 25,000 employees worldwide. The company delivers subscription-based AI and content products to professionals across the globe. With a history shaped by continuous mergers and acquisitions, Thomson Reuters faced a critical challenge: how to unify a constantly shifting data landscape across more than 100 business systems while simultaneously modernizing its analytics infrastructure.

The Challenge: Fragmented Data in a Fast-Moving Enterprise

Thomson Reuters’ growth through M&A had created a deeply fragmented data environment. Each acquisition brought its own systems, data models, and technical approaches. Legacy systems remained in use long after the deals closed, data ownership structures shifted with organizational changes, and documentation was inconsistent across teams.

The impact was significant: data silos and duplication were widespread, a variety of incompatible technical approaches coexisted, visibility into data lineage was limited, and technical debt was growing steadily. At the same time, the organization’s rapid expansion into AI-powered products was driving exponential demand for trusted, accessible data.

The team faced a balancing act: meeting urgent business data demands while building a sustainable foundation for the future, reducing dependency on risky legacy systems, and delivering quick wins without compromising the long-term transformation vision.

The Approach: A Unified Data Layer Built on Data Vault and VaultSpeed

Rather than attempting a monolithic migration, Thomson Reuters designed a data product architecture centered on a unified data layer running on Snowflake. The architecture follows a clear flow: sources feed into a raw landing zone, then into curated data assets, through a unified data layer modeled using Data Vault 2.0 (with Hubs, Links, and Satellites), and finally into a data product marketplace for consumption by BI platforms, AI applications, and reverse ETL processes.

VaultSpeed was selected as the automation engine to drive the Data Vault modeling and code generation. This model-driven approach replaced manual development, ensuring standardized patterns across every domain and enabling the team to keep pace with constant business change from ongoing M&A activity and internal transformations.

The enterprise data model was designed to be continuously evolving, covering key business domains including:

• Customer and Party

• Product and Sales

• Order Management and Deliveries

• Invoicing, Billing and Account Receivables

• Finance and Subscriptions

• Marketing, Channel and Events

• HR and IT Management

 

Underpinning this architecture, an enterprise data catalog, marketplace, and data lineage and access management layer provided governance and discoverability across the organization.

The Journey: Incremental Progress Over Two Years

The transformation began in early 2024 with team formation. At the outset, the team had little Data Vault experience, but they built foundations with AWS Snowflake data pipelines and began the EDW migration with the first core data products covering Customer and Product domains, as well as Invoicing and Account Receivables.

By mid-2024, four data products were live. Through 2025, community building and stakeholder engagement intensified, the internal data marketplace was launched, and the model expanded to cover Subscriptions, Orders, Deliveries, Finance, and Usage data—bringing the total to 11 live data products.

In 2026, the team reached a major milestone: the legacy Enterprise Data Warehouse began decommissioning, with additional data products continuing to go live. A major SAP integration is also in progress.

Transformation at a Glance


2024 — Start


2026 — Now


Fragmented data landscape


Unified data layer


Manual processes


Automated with VaultSpeed


Limited visibility


Enterprise catalog live


Aging EDW


Migration in progress

 

The Impact: Flexibility, Speed, and Confidence

Looking back after two years, the results speak clearly. Data Vault delivered the flexibility the team needed to absorb constant changes from M&A and business transformation. VaultSpeed automation kept the project on track and on schedule, replacing what would have been error-prone manual development with standardized, repeatable patterns.

The organization has successfully migrated major legacy warehouses, core data products are operational across the enterprise, and data is now accessible through an internal marketplace rather than being locked in silos. The democratization of data access—making it available to all, not just the traditionally connected teams—has been a key outcome.

The Takeaway: Lessons from Scaling Data Integration

Thomson Reuters’ experience offers valuable lessons for any enterprise navigating data integration at scale. A multi-year transformation requires sustained stamina and consistent stakeholder engagement. Different audiences—from technical teams to executive sponsors—need the value articulated in different ways. And investing in automation from the start is not optional: manual implementation at this scale would have been an invitation for failure.

The team also identified four key success factors for building with Data Vault: strong skills and training in business process knowledge and automation tooling; the right data platform supporting parallel processing; automation of development for faster delivery and standardized code; and an incremental build approach following agile practices rather than big-bang implementation.

VaultSpeed has been central to this transformation. By automating Data Vault modeling and code generation, it enabled Thomson Reuters to scale its data platform across domains without accumulating technical debt—even as the business continued to evolve through acquisitions and transformation. As the platform expands into new domains and the legacy EDW is fully decommissioned, Thomson Reuters is well positioned to support its growing AI ambitions with a trusted, unified data foundation.

It's time to 10x your data delivery

VaultSpeed automates the transformation of data scattered across dozens of source systems into governed, production-ready pipelines, native to your cloud data platform.

It's time to 10x your data delivery

VaultSpeed automates the transformation of data scattered across dozens of source systems into governed, production-ready pipelines, native to your cloud data platform.

It's time to 10x your data delivery

VaultSpeed automates the transformation of data scattered across dozens of source systems into governed, production-ready pipelines, native to your cloud data platform.