Data virtualization is any approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located.
Unlike the traditional extract, transform, load ("ETL") process, the data remains in place, and real-time access is given to the source system for the data, thus reducing the risk of data errors and reducing the workload of moving data around that may never be used.
Unlike a federated database system, it does not attempt to impose a single data model on the data (heterogeneous data). The technology also supports the writing of transaction data updates back to the source systems.
To resolve differences in source and consumer formats and semantics, various abstraction and transformation techniques are used. This concept and software is a subset of data integration and is commonly used within business intelligence, service-oriented architecture data services, cloud computing, enterprise search, and master data management.
Data Virtualization software provides some or all of the following capabilities:
Data virtualization software may include functions for development, operation, and/or management.
Benefits include:
Drawbacks include:
Some data virtualization technologies include:
Enterprise information integration (EII), first coined by Metamatrix, now known as Red Hat JBoss Data Virtualization, and federated database systems are terms used by some vendors to describe a core element of data virtualization: the capability to create relational JOINs in a federated VIEW.