CD-Merge

Material for the Software Language Engineering and Model-based Software Engineering lecture, Bernhard Rumpe

CDMerge: Semantically Sound Merging of Class Diagrams for Software Component Integration

This explanation is intended for students who want to understand how multiple UML class diagrams can be merged into a single coherent model, and what conceptual rules CD-Merge applies when resolving conflicts, matching model elements, and preserving information. The explanation is based on [LRRS23].

Introduction

In the following, we assume that you are familiar with CD4Analysis. You may read the introduction written in a similar style to this article.

CD-Merge is a tool for integrating multiple class diagrams into a single, consistent, and semantically meaningful model. It addresses a central challenge in model-based software engineering: when different teams, components, or tools produce separate class diagrams, these must eventually be merged into a shared representation of the system. CD-Merge follows a deterministic merge strategy based on the open-world semantics of class diagrams, meaning that incompleteness is permitted and the merging process does not rely on having complete knowledge of the entire system.

A central concept in software engineering is Divide and Conquer: systems are separated into submodules that exhibit high cohesion and low coupling. The same principle applies to class diagram modeling. Rather than treating the entire system as one large model, each submodule can be modeled independently. By composing these independently developed diagrams, it becomes possible to check whether the submodules are compatible and to construct a complete system model on demand when needed. A practical example of this approach is MaCoCo, a large class diagram created for a university information system.

The merging operator ⊕ combines two class diagrams into a single diagram whose result contains all information provided by both inputs. Nothing is removed and nothing new is added; the merged diagram is therefore the exact combination of both diagrams. A merge is only possible when the information in the two diagrams is compatible. If contradictions occur—such as conflicting attribute types, incompatible inheritance relations, or inconsistent cardinalities—the merge cannot be performed. Any system that satisfies the merged diagram (A ⊕ B) automatically satisfies both diagrams A and B, reflecting the intention of the merge operator: the merged model captures exactly what both diagrams jointly require.

In this example, two independently developed class diagrams are merged using the CD-merge operator. The first diagram, Management, introduces the classes Employee, Professor, and Lecture, along with their attributes and the association between Professor and Lecture. The cardinality on the professor side is unspecified, while the lecture side requires each professor to hold at least one lecture. The second diagram, Teaching, extends the domain with the class Student, adds the attribute credits to Lecture, and provides the same association between Professor and Lecture, this time specifying the missing cardinality on the Professor side. It also introduces a named association (attendance) between Lecture and Student.

When both diagrams are merged, all compatible information is combined into the single class diagram University. The merged diagram contains:

  • the inheritance relationship from Professor to Employee,
  • all attributes contributed by both diagrams for Professor and Lecture,
  • the refined cardinalities and role names of the Professor–Lecture association,
  • the additional Lecture–Student association from the teaching diagram.

Background

Class diagrams define the structure of object-oriented systems by describing classes, their attributes, and the links that may exist between their instances. Formally, the semantics of a class diagram is the set of all object structures that conform to the constraints imposed by that diagram. This mapping is expressed as:

sem : CD → ℘(OS)

where each class diagram corresponds to a set of valid object graphs.

The CDMerge operator takes two models, A and B, as input and produces a merged model A ⊕ B. The operator is defined such that the merged model’s semantics correspond to a subset of the intersection of the input diagrams’ semantics:

sem(A ⊕ B) ⊆ sem(A) ∩ sem(B)

This means the merged diagram describes object structures that satisfy both A and B simultaneously.

Two different interpretations of class diagram semantics can be distinguished: closed-world and open-world. Under closed-world semantics, only the elements that are explicitly modeled in the class diagram are permitted. Consequently, merging cannot be interpreted as an intersection of semantics under this view. Open-world semantics, in contrast, treats missing information as underspecification. If something is not explicitly modeled, it is still allowed as long as it does not contradict what is specified. This interpretation is particularly useful for model evolution and for composing independently developed diagrams.

The example illustrates these differences. The object structure on the left contains an instance tony : Manager with a managedBy link. This structure is not an instance of the given class diagram under closed-world semantics, because the diagram does not explicitly include such an object configuration. In particular, it does not specify that a Manager may be linked to itself via the managedBy association. Since closed-world semantics only permits what is explicitly modeled, this object structure is rejected. However, under open-world semantics, the same structure is permitted. Although the diagram does not explicitly state that a Manager can be linked to itself, it also does not prohibit such a configuration. Therefore, open-world semantics does not exclude the possibility that the Manager class will be refined or extended in the future — for example, by adding Inheritance from Employee, which would naturally enable self-links through the managedBy association.

A merged class diagram can be understood as a minimal expansion of its input diagrams. To formalize this, a class diagram A’ is considerd as an expansion of A if:

  • all classes, attributes, and associations defined in A also appear in A’,
  • all inheritance relations from A are preserved in A’.
  • However, A’ may also introduce additional elements (e.g., new attributes, specified cardinalities, extra associations, etc.).

In the context of merging, the composed model is an expansion to all input diagrams. More specifically, the merged diagram contains exactly the elements and specifications that occur in at least one of the original diagrams—but nothing beyond that. This ensures that the merge is the smallest model that still fully represents the information from all contributing diagrams.

Formally, this allows us to achieve a subset relation with respect to the semantics without adding any additional information / specification to the merged model:

sem_ow(A ⊕ B) ⊆ sem_ow(A) ∩ sem_ow(B)

Under open-world semantics, the merged model describes a set of object structures that satisfy both input diagrams simultaneously.
However, despite not adding any additional information / specification to the merged model, we cannot in general achieve:

sem(A ⊕ B) = sem(A) ∩ sem(B)

This can be demonstrated by the following example:

As can be seen, the object tony of type Manager is a legal instance of both class diagram A and class diagram B. However, it is not permitted by the merged model A ⊕ B as each Employee is required to have at least one Task and every Manager is also an Employee.

Despite not adding any additional specification, the combination of syntactical model elements in the composed model further restricts the set of permitted instances. Consequently, we have a strict subset relation in this case:

sem(A ⊕ B) ⊂ sem(A) ∩ sem(B)

Merge Operation

To construct the composed model, the CDMerge process follows a structured sequence of steps to combine two class diagrams into a consistent merged result.

Merging Steps

1. Identify matching elements:

The first step is to determine which elements from the input diagrams correspond to each other. Matching is based purely on structural criteria:

  • Classes match by name.
  • Attributes match when both the owning class and the attribute name coincide.
  • Associations match when their end classes, association name, and role names align.

2. Merge matching elements:

Once matches have been identified, CDMerge merges them, but only if they are compatible:

  • Attributes are merged only if their types are compatible.
  • Classes are merged only if their attributes and inheritance relations can be combined without conflict.
  • Associations are merged only if the match is unambiguous and their cardinalities are compatible.

3. Add all unmatched elements:

After merging the compatible matches, all remaining elements—those that appear in only one of the input diagrams—are added directly to the resulting class diagram as well.

Merging Classes

Two diagrams, CD A and CD B, both define a class Person:

  • CD A contains
    • String name
    • String email
    • int birthdate
  • CD B contains
    • String name
    • int id

When merging these diagrams (CD A ⊕ CD B), CD-Merge constructs a single class Person that contains the union of all attributes:

The resulting diagram CD C contains:

  • String name
  • String email
  • int birthdate
  • int id

Under open-world semantics, the merged class diagram must describe a set of object structures that is compatible with both diagrams:

  • sem(A) = all object structures valid for CD A
  • sem(B) = all object structures valid for CD B

The merged diagram must satisfy:

sem(A ⊕ B) ⊆ sem(A) ∩ sem(B)

In this example, the unified class structure is compatible with both original diagrams, as no attribute conflicts occur and the intersection is non-empty, making the merge valid.

Merging Attributes

As before, CD-Merge attempts to unify all attributes belonging to the same class. However, the attribute birthdate appears in both diagrams with different types:

  • In CD A:
    • int birthdate
  • In CD B:
    • Date birthdate

CD-Merge requires that the types of identically named attributes must be identical. If this is not the case, the merge produces an error.

The conflict means that:

sem(A ⊕ B) = ∅

There exists no valid object structure that simultaneously satisfies both definitions of birthdate. Hence, the diagrams cannot be merged under open-world semantics. CD-Merge does not perform smart repairs such as:

  • Automatically choosing one type,
  • Inferring a more general common supertype,
  • Renaming or refactoring attributes. Instead, type mismatches are treated as hard failures.

Merging Inheritance

This example illustrates how CD-Merge handles inheritance when the same classes appear in multiple diagrams but with different superclass relations.

Two class diagrams, CD A and CD B, define similar class structures:

  • In CD A:

    • Employee is a direct subclass of Person.
    • Professor is a subclass of Employee.
    • Professor has the attribute boolean tenured.
  • In CD B:

    • Professor is shown as a subclass of Person.
    • Professor has two attributes:

      • int empNo
      • String phone

Even though the hierarchy differs slightly, both diagrams describe Professor as some kind of Person.

CD-Merge can reconcile inheritance chains even if the diagrams disagree about intermediate superclasses. It only requires the subclass relation to be consistent, not structurally identical.

During merging:

  1. All attributes of a class and its superclasses must be preserved, regardless of where they appear in the input diagrams.
  2. Since CD B places two attributes (empNo and phone) directly in Professor, but CD A defines empNo in the superclass Employee, CD-Merge performs an attribute pull-up: empNo is moved to the least common superclass (Employee).

Inheritance merging must preserve the fundamental structural rules of UML class hierarchies. One of the most important constraints is:

No circular inheritance hierarchy; in other words, a class diagram must not contain cycles in its inheritance graph.

CD-Merge therefore rejects this situation, because no consistent merged hierarchy can be created that satisfies both input models.

  • In CD A: Lecturer is a subclass of Professor.

  • In CD B: Professor is a subclass of Lecturer.

Each diagram individually is well-formed. However, Merging these relationships results in:

Professor → Lecturer → Professor

which forms a cycle. The output diagram CD C is invalid, and CD-Merge reports an error.

Merging Associations

In the given situation, CD-Merge identifies that both associations connect the same pair of classes with compatible roles, and the merged association in CD C therefore contains all information present in the input diagrams. The merge is valid because both diagrams describe compatible associations, no conflicting multiplicities or contradictory role names occur, the merged association satisfies all constraints from both diagrams, and the resulting relationship retains a clear and unambiguous meaning.

This slide illustrates a situation where CD-Merge must reject the merge because the associations in the input diagrams cannot be matched uniquely. Even though the class names match, the structure of the associations in CD B creates uncertainty about how the merge should proceed.

CD A contains exactly one association between Person and Contract. This association is unambiguous within CD A. CD B contains two different associations between Person and Contract. When CD-Merge attempts to combine CD A and CD B, it tries to determine which association in CD B corresponds to the single association in CD A, but this cannot be resolved. There is no unique and unambiguous choice: the association in CD A could be matched with the employment association or with the contact association. Since it only partially matches both and fully matches neither, the merge is rejected.

Cardinality constraints specify how many related instances an implementation must support, and because these constraints are typically interpreted as contracts for code generation, CD-Merge must enforce them strictly. From a formal semantics standpoint, taking the intersection of ranges would be acceptable—for example, merging 1..* with 1..4 could yield 1..4, and merging 0..1 with * could yield 0..1. In practice, however, a cardinality expresses a commitment that generated code must uphold. If CD A guarantees that a Contract must be linked to at least one Person, the merged model must preserve that requirement. If CD B guarantees that a Person may have arbitrarily many workContracts, the merged model must also allow an unbounded number. Intersecting ranges would remove permitted behaviors from both models and violate the expectations of the developers of either diagram. A merge must not silently strengthen or weaken these commitments, and therefore cardinalities must match exactly or the merge fails.

This slide illustrates a merge conflict that occurs when associations and inheritance interact, resulting in ambiguous associations in the merged class diagram. Each association introduces roles that define reading promises such as getLecturer() and writing promises such as setLecturer(Lecturer l), which are automatically generated from the association structure. Because roles form part of the public API of a generated class, they cannot be altered during merging; association targets and role names must therefore remain consistent across diagrams. Otherwise, the merge would silently change which object a getter or setter refers to, violating expected behavior.

In the given input diagrams, CD A specifies that Professor is a subclass of Lecturer and defines an association association holds [1] Professor (lecturer) <-> Course [*], while CD B uses the same inheritance hierarchy but defines the association association holds [1] Lecturer (lecturer) <-> Course [*]. When merged, the result contains two associations with the same role name lecturer but attached to different classes, creating an ambiguous role on the Course side. It is unclear which association Course.getLecturer() should refer to or whether the lecturer should be of type Lecturer or Professor.

Merging Generated Code

CD-Merge integrates with a modular code-generation workflow. When several class diagrams (CDs) are merged, the resulting model should still fit seamlessly into a larger software system composed of generated and handwritten Java code. A core goal of CD-Merge is that merging class diagrams must not break existing generated artifacts. If diagrams like CD1 and CD2 already have generated data classes, compiled class files, or handwritten extensions, all of these must remain valid and reusable after merging into CD M. This ensures that no regeneration is required, handwritten code continues to bind correctly, and existing modules still compile and run. In this way, merging supports a modular architecture where each class diagram can represent an independent subsystem.

Two important concepts shown on the slide are:

  1. Late binding of handwritten code: Handwritten code (marked as «hc») is added after code generation and must not be overwritten or regenerated.
  2. Reuse of pre-generated/compiled submodules as black boxes: Subsystems that have already been generated, compiled or packaged into modules can be reused directly without re-generating or re-compiling them. The merge operation sees these modules as black boxes with well-defined external interfaces.

The figure illustrates a complete incremental workflow:

  1. CD1 is generated into Java code, compiled, and packaged into Module 1.
  2. Similarly, CD2 produces its own generated and compiled module. CD2 also has handwritten extensions (extensions2) that enrich the generated data classes.
  3. The merge operator combines the two class diagrams into a new, integrated diagram.
  4. From the merged diagram, new generated code (data M) and new compiled classes (classes M) are created, forming the basis of the overall system.
  5. Additional handwritten extensions (extensionsM) are bound into the final system in the same late-binding fashion.
  6. All modules are wired together without regenerating or modifying previously built components.

Conclusion

CDMerge brings the divide-and-conquer principle to class diagram modeling. Instead of maintaining a single large and monolithic class diagram, developers can decompose a system into multiple smaller, domain-specific component diagrams. Using the merge operator (), these components can then be recombined into a consistent and unified model whenever integration is required.

A central strength of CDMerge lies in its semantically sound merging. Conflicting or incompatible model elements are not silently overridden or auto-repaired; instead, CDMerge explicitly detects such inconsistencies, ensuring that the formal semantics of the input diagrams are preserved.

As part of the CD4Analysis toolchain, CDMerge seamlessly integrates into existing MontiCore-based modeling and generation workflows.

The capabilities of CDMerge were demonstrated in the MaCoCo case study, where a very large class diagram describing a university information system was manually decomposed into four component diagrams. CDMerge successfully merged these components back into a coherent and semantically valid overall model, highlighting its practical value for realistic, large-scale modeling scenarios.

Literature:

Join our mailing list for updates regarding courses and theses: