Schema matching

The terms schema matching and mapping are often used interchangeably. For this article, we differentiate the two as follows: Schema matching is the process of identifying that two objects are semantically related (scope of this article) while mapping refers to the transformations between the objects. For example, in the two schemas DB1.Student (Name, SSN, Level, Major, Marks) and DB2.Grad-Student (Name, ID, Major, Grades); possible matches would be: DB1.Student ≈ DB2.Grad-Student; DB1.SSN = DB2.ID etc. and possible transformations or mappings would be: DB1.Marks to DB2.Grades (100-90 A; 90-80 B: etc.).

Automating these two approaches has been one of the fundamental tasks of data integration. In general it is not possible to determine fully automatically the different correspondences between two schemas, primarily because of the differing and often not explicated or documented semantics of the two schemas.

Among others, common challenges to automating matching and mapping have been previously classified in especially for relational DB schemas; and in – a fairly comprehensive list of heterogeneity not limited to the relational model recognizing schematic vs semantic differences/heterogeneity. Most of these heterogeneities exist because schemas use different representations or definitions to represent the same information (schema conflicts); OR different expressions, units, and precision result in conflicting representations of the same data (data conflicts). Research in schema matching seeks to provide automated support to the process of finding semantic matches between two schemas. This process is made harder due to heterogeneities at the following levels

Discusses a generic methodology for the task of schema integration or the activities involved. According to the authors, one can view the integration

Approaches to schema integration can be broadly classified as ones that exploit either just schema information or schema and instance level information.

Schema-level matchers only consider schema information, not instance data. The available information includes the usual properties of schema elements, such as name, description, data type, relationship types (part-of, is-a, etc.), constraints, and schema structure. Working at the element (atomic elements like attributes of objects) or structure level (matching combinations of elements that appear together in a structure), these properties are used to identify matching elements in two schemas. Language-based or linguistic matchers use names and text (i.e., words or sentences) to find semantically similar schema elements. Constraint based matchers exploit constraints often contained in schemas. Such constraints are used to define data types and value ranges, uniqueness, optionality, relationship types and cardinalities, etc. Constraints in two input schemas are matched to determine the similarity of the schema elements.

...
Wikipedia