CD-Merge
Material for the Software Language Engineering and Model-based Software Engineering lecture, Bernhard Rumpe
CDMerge: Semantically Sound Merging of Class Diagrams for Software Component Integration
This explanation is intended for students who want to understand how multiple
UML class diagrams can be merged into a single coherent model, and what
conceptual rules CD-Merge applies when resolving conflicts, matching model
elements, and preserving information. The explanation is based
on [LRRS23].
Introduction
In the following, we assume that you are familiar with CD4Analysis. You may read the introduction written in a similar style to this article.
CD-Merge is a tool for integrating multiple class diagrams into a single, consistent, and semantically meaningful model. It addresses a central challenge in model-based software engineering: when different teams, components, or tools produce separate class diagrams, these must eventually be merged into a shared representation of the system. CD-Merge follows a deterministic merge strategy based on the open-world semantics of class diagrams, meaning that incompleteness is permitted and the merging process does not rely on having complete knowledge of the entire system.
A central concept in software engineering is Divide and Conquer: systems are separated into submodules that exhibit high cohesion and low coupling. The same principle applies to class diagram modeling. Rather than treating the entire system as one large model, each submodule can be modeled independently. By composing these independently developed diagrams, it becomes possible to check whether the submodules are compatible and to construct a complete system model on demand when needed. A practical example of this approach is MaCoCo, a large class diagram created for a university information system.
The merging operator ⊕ combines two class diagrams into a single diagram whose result contains all information provided by both inputs. Nothing is removed and nothing new is added; the merged diagram is therefore the exact combination of both diagrams. A merge is only possible when the information in the two diagrams is compatible. If contradictions occur—such as conflicting attribute types, incompatible inheritance relations, or inconsistent cardinalities—the merge cannot be performed. Any system that satisfies the merged diagram (A ⊕ B) automatically satisfies both diagrams A and B, reflecting the intention of the merge operator: the merged model captures exactly what both diagrams jointly require.
In this example, two independently developed class diagrams are merged using the
CD-merge operator. The first diagram, Management, introduces the classes
Employee, Professor, and Lecture, along with their attributes and the
association between Professor and Lecture. The cardinality on the professor
side is unspecified, while the lecture side requires each professor to hold at
least one lecture. The second diagram, Teaching, extends the domain with the
class Student, adds the attribute credits to Lecture, and provides the
same association between Professor and Lecture, this time specifying the
missing cardinality on the Professor side. It also introduces a named
association (attendance) between Lecture and Student.
When both diagrams are merged, all compatible information is combined into the single class diagram University. The merged diagram contains:
- the inheritance relationship from
ProfessortoEmployee, - all attributes contributed by both diagrams for
ProfessorandLecture, - the refined cardinalities and role names of the
Professor–Lectureassociation, - the additional
Lecture–Studentassociation from the teaching diagram.
Background
Class diagrams define the structure of object-oriented systems by describing classes, their attributes, and the links that may exist between their instances. Formally, the semantics of a class diagram is the set of all object structures that conform to the constraints imposed by that diagram. This mapping is expressed as:
sem : CD → ℘(OS)
where each class diagram corresponds to a set of valid object graphs.
The CDMerge operator takes two models, A and B, as input and produces a
merged model A ⊕ B. The operator is defined such that the merged model’s
semantics correspond to a subset of the intersection of the input diagrams’
semantics:
sem(A ⊕ B) ⊆ sem(A) ∩ sem(B)
This means the merged diagram describes object structures that satisfy both A and B simultaneously.
Two different interpretations of class diagram semantics can be distinguished: closed-world and open-world. Under closed-world semantics, only the elements that are explicitly modeled in the class diagram are permitted. Consequently, merging cannot be interpreted as an intersection of semantics under this view. Open-world semantics, in contrast, treats missing information as underspecification. If something is not explicitly modeled, it is still allowed as long as it does not contradict what is specified. This interpretation is particularly useful for model evolution and for composing independently developed diagrams.
The example illustrates these differences. The object structure on the left
contains an instance tony : Manager with a managedBy link. This structure is
not an instance of the given class diagram under closed-world semantics, because
the diagram does not explicitly include such an object configuration. In
particular, it does not specify that a Manager may be linked to itself via the
managedBy association. Since closed-world semantics only permits what is
explicitly modeled, this object structure is rejected. However, under open-world
semantics, the same structure is permitted. Although the diagram does not
explicitly state that a Manager can be linked to itself, it also does not
prohibit such a configuration. Therefore, open-world semantics does not exclude
the possibility that the Manager class will be refined or extended in the
future — for example, by adding Inheritance from Employee, which would naturally
enable self-links through the managedBy association.
A merged class diagram can be understood as a minimal expansion of its input
diagrams. To formalize this, a class diagram A’ is considerd as an expansion
of A if:
- all classes, attributes, and associations defined in
Aalso appear inA’, - all inheritance relations from A are preserved in
A’. - However,
A’may also introduce additional elements (e.g., new attributes, specified cardinalities, extra associations, etc.).
In the context of merging, the composed model is an expansion to all input diagrams. More specifically, the merged diagram contains exactly the elements and specifications that occur in at least one of the original diagrams—but nothing beyond that. This ensures that the merge is the smallest model that still fully represents the information from all contributing diagrams.
Formally, this allows us to achieve a subset relation with respect to the semantics without adding any additional information / specification to the merged model:
sem_ow(A ⊕ B) ⊆ sem_ow(A) ∩ sem_ow(B)
Under open-world semantics, the merged model describes a set of object
structures that satisfy both input diagrams simultaneously.
However, despite not adding any additional information / specification to the
merged model, we cannot in general achieve:
sem(A ⊕ B) = sem(A) ∩ sem(B)
This can be demonstrated by the following example:
As can be seen, the object tony of type Manager is a legal instance of both
class diagram A and class diagram B. However, it is not permitted by the
merged model A ⊕ B as each Employee is required to have at least one Task
and every Manager is also an Employee.
Despite not adding any additional specification, the combination of syntactical model elements in the composed model further restricts the set of permitted instances. Consequently, we have a strict subset relation in this case:
sem(A ⊕ B) ⊂ sem(A) ∩ sem(B)
Merge Operation
To construct the composed model, the CDMerge process follows a structured sequence of steps to combine two class diagrams into a consistent merged result.
Merging Steps
1. Identify matching elements:
The first step is to determine which elements from the input diagrams correspond to each other. Matching is based purely on structural criteria:
- Classes match by name.
- Attributes match when both the owning class and the attribute name coincide.
- Associations match when their end classes, association name, and role names align.
2. Merge matching elements:
Once matches have been identified, CDMerge merges them, but only if they are compatible:
- Attributes are merged only if their types are compatible.
- Classes are merged only if their attributes and inheritance relations can be combined without conflict.
- Associations are merged only if the match is unambiguous and their cardinalities are compatible.
3. Add all unmatched elements:
After merging the compatible matches, all remaining elements—those that appear in only one of the input diagrams—are added directly to the resulting class diagram as well.
Merging Classes
Two diagrams, CD A and CD B, both define a class Person:
-
CD AcontainsString nameString emailint birthdate
-
CD BcontainsString nameint id
When merging these diagrams (CD A ⊕ CD B), CD-Merge constructs a single class
Person that contains the union of all attributes:
The resulting diagram CD C contains:
String nameString emailint birthdateint id
Under open-world semantics, the merged class diagram must describe a set of object structures that is compatible with both diagrams:
-
sem(A)= all object structures valid for CD A -
sem(B)= all object structures valid for CD B
The merged diagram must satisfy:
sem(A ⊕ B) ⊆ sem(A) ∩ sem(B)
In this example, the unified class structure is compatible with both original diagrams, as no attribute conflicts occur and the intersection is non-empty, making the merge valid.
Merging Attributes
As before, CD-Merge attempts to unify all attributes belonging to the same
class. However, the attribute birthdate appears in both diagrams with
different types:
- In
CD A:int birthdate
- In
CD B:Date birthdate
CD-Merge requires that the types of identically named attributes must be identical. If this is not the case, the merge produces an error.
The conflict means that:
sem(A ⊕ B) = ∅
There exists no valid object structure that simultaneously satisfies both
definitions of birthdate. Hence, the diagrams cannot be merged under
open-world semantics. CD-Merge does not perform smart repairs such as:
- Automatically choosing one type,
- Inferring a more general common supertype,
- Renaming or refactoring attributes. Instead, type mismatches are treated as hard failures.
Merging Inheritance
This example illustrates how CD-Merge handles inheritance when the same classes appear in multiple diagrams but with different superclass relations.
Two class diagrams, CD A and CD B, define similar class structures:
-
In
CD A:-
Employeeis a direct subclass ofPerson. -
Professoris a subclass ofEmployee. -
Professorhas the attributeboolean tenured.
-
-
In
CD B:-
Professoris shown as a subclass ofPerson. -
Professorhas two attributes:int empNoString phone
-
Even though the hierarchy differs slightly, both diagrams describe Professor
as some kind of Person.
CD-Merge can reconcile inheritance chains even if the diagrams disagree about intermediate superclasses. It only requires the subclass relation to be consistent, not structurally identical.
During merging:
- All attributes of a class and its superclasses must be preserved, regardless of where they appear in the input diagrams.
- Since
CD Bplaces two attributes (empNoandphone) directly inProfessor, butCD AdefinesempNoin the superclassEmployee, CD-Merge performs an attribute pull-up:empNois moved to the least common superclass (Employee).
Inheritance merging must preserve the fundamental structural rules of UML class hierarchies. One of the most important constraints is:
No circular inheritance hierarchy; in other words, a class diagram must not contain cycles in its inheritance graph.
CD-Merge therefore rejects this situation, because no consistent merged hierarchy can be created that satisfies both input models.
-
In
CD A:Lectureris a subclass ofProfessor. -
In
CD B:Professoris a subclass ofLecturer.
Each diagram individually is well-formed. However, Merging these relationships results in:
Professor → Lecturer → Professor
which forms a cycle. The output diagram CD C is invalid, and CD-Merge reports
an error.
Merging Associations
In the given situation, CD-Merge identifies that both associations connect the
same pair of classes with compatible roles, and the merged association in CD C
therefore contains all information present in the input diagrams. The merge is
valid because both diagrams describe compatible associations, no conflicting
multiplicities or contradictory role names occur, the merged association
satisfies all constraints from both diagrams, and the resulting relationship
retains a clear and unambiguous meaning.
This slide illustrates a situation where CD-Merge must reject the merge because
the associations in the input diagrams cannot be matched uniquely. Even though
the class names match, the structure of the associations in CD B creates
uncertainty about how the merge should proceed.
CD A contains exactly one association between Person and Contract. This
association is unambiguous within CD A. CD B contains two different
associations between Person and Contract. When CD-Merge attempts to combine
CD A and CD B, it tries to determine which association in CD B corresponds
to the single association in CD A, but this cannot be resolved. There is no
unique and unambiguous choice: the association in CD A could be matched with
the employment association or with the contact association. Since it only
partially matches both and fully matches neither, the merge is rejected.
Cardinality constraints specify how many related instances an implementation
must support, and because these constraints are typically interpreted as
contracts for code generation, CD-Merge must enforce them strictly. From a
formal semantics standpoint, taking the intersection of ranges would be
acceptable—for example, merging 1..* with 1..4 could yield 1..4, and
merging 0..1 with * could yield 0..1. In practice, however, a cardinality
expresses a commitment that generated code must uphold. If CD A guarantees
that a Contract must be linked to at least one Person, the merged model must
preserve that requirement. If CD B guarantees that a Person may have
arbitrarily many workContracts, the merged model must also allow an unbounded
number. Intersecting ranges would remove permitted behaviors from both models
and violate the expectations of the developers of either diagram. A merge must
not silently strengthen or weaken these commitments, and therefore cardinalities
must match exactly or the merge fails.
This slide illustrates a merge conflict that occurs when associations and
inheritance interact, resulting in ambiguous associations in the merged class
diagram. Each association introduces roles that define reading promises such as
getLecturer() and writing promises such as setLecturer(Lecturer l), which
are automatically generated from the association structure. Because roles form
part of the public API of a generated class, they cannot be altered during
merging; association targets and role names must therefore remain consistent
across diagrams. Otherwise, the merge would silently change which object a
getter or setter refers to, violating expected behavior.
In the given input diagrams, CD A specifies that Professor is a subclass of
Lecturer and defines an association
association holds [1] Professor (lecturer) <-> Course [*], while CD B uses
the same inheritance hierarchy but defines the association
association holds [1] Lecturer (lecturer) <-> Course [*]. When merged, the
result contains two associations with the same role name lecturer but attached
to different classes, creating an ambiguous role on the Course side. It is
unclear which association Course.getLecturer() should refer to or whether the
lecturer should be of type Lecturer or Professor.
Merging Generated Code
CD-Merge integrates with a modular code-generation workflow. When several class
diagrams (CDs) are merged, the resulting model should still fit seamlessly into
a larger software system composed of generated and handwritten Java code. A core
goal of CD-Merge is that merging class diagrams must not break existing
generated artifacts. If diagrams like CD1 and CD2 already have generated
data classes, compiled class files, or handwritten extensions, all of these must
remain valid and reusable after merging into CD M. This ensures that no
regeneration is required, handwritten code continues to bind correctly, and
existing modules still compile and run. In this way, merging supports a modular
architecture where each class diagram can represent an independent subsystem.
Two important concepts shown on the slide are:
- Late binding of handwritten code: Handwritten code (marked as «hc») is added after code generation and must not be overwritten or regenerated.
- Reuse of pre-generated/compiled submodules as black boxes: Subsystems that have already been generated, compiled or packaged into modules can be reused directly without re-generating or re-compiling them. The merge operation sees these modules as black boxes with well-defined external interfaces.
The figure illustrates a complete incremental workflow:
-
CD1is generated into Java code, compiled, and packaged into Module 1. - Similarly,
CD2produces its own generated and compiled module.CD2also has handwritten extensions (extensions2) that enrich the generated data classes. - The merge operator
⊕combines the two class diagrams into a new, integrated diagram. - From the merged diagram, new generated code (
data M) and new compiled classes (classes M) are created, forming the basis of the overall system. - Additional handwritten extensions (
extensionsM) are bound into the final system in the same late-binding fashion. - All modules are wired together without regenerating or modifying previously built components.
Conclusion
CDMerge brings the divide-and-conquer principle to class diagram modeling.
Instead of maintaining a single large and monolithic class diagram, developers
can decompose a system into multiple smaller, domain-specific component
diagrams. Using the merge operator (⊕), these components can then be
recombined into a consistent and unified model whenever integration is required.
A central strength of CDMerge lies in its semantically sound merging. Conflicting or incompatible model elements are not silently overridden or auto-repaired; instead, CDMerge explicitly detects such inconsistencies, ensuring that the formal semantics of the input diagrams are preserved.
As part of the CD4Analysis toolchain, CDMerge seamlessly integrates into existing MontiCore-based modeling and generation workflows.
The capabilities of CDMerge were demonstrated in the MaCoCo case study, where a very large class diagram describing a university information system was manually decomposed into four component diagrams. CDMerge successfully merged these components back into a coherent and semantically valid overall model, highlighting its practical value for realistic, large-scale modeling scenarios.
Literature:
Further links on the tooling:
- MontiCore Language Workbench,
- MontiGem,
- MontiCore List of Language,
- CD4Analysis Tool Readme,
- CD4Analysis Tool.
Join our mailing list for updates regarding courses and theses: