Research Software Engineering

White Paper by Bernhard Rumpe, Software Engineering, RWTH Aachen

Research Software Engineering (RSE) is a term that was created around 2010 and has started to become prominent in the UK, USA, and now also Germany. Wikipedia defines it this way:

Definition: Research Software Engineering (RSE)

Research Software Engineering is the use of Software Engineering practices in research applications.

Software Engineering (SE)

Similar to the original term “software engineering”, the creation of “RSE” was felt necessary because there was and is a crisis in software development, especially a “research software crisis”. Software Engineering was made prominent when F. L. Bauer hosted the first conference on Software Engineering in 1968 to address the software crisis. Since then, the Software Engineering discipline has gained a lot of insight into the process of software development cumulating in

  • several introductory and expert books on software engineering,
  • many more books on dedicated sub-fields of software engineering, namely requirements engineering, architecture, design, modeling, testing, development processes, and of course programming,
  • and the Software Engineering Body of Knowledge (SWEBOK V3.0) that structures and aggregates what software developers have learned in the last 50 years.
Software Engineering Challenges

Software is rather heterogeneous, ranging from embedded to desktop, to autonomous, to games, to business, and also to research software. While the problems are always the same, namely:

  • How to ensure the quality of the software?
  • How to efficiently develop the software (i.e., preserve developer resources)?
  • How to meet timing deadlines?

The answers and then, in particular, the development techniques are often different in the various sub-areas of software development as the starting situation, the kind of software, the complexity drivers, the needed quality characteristics, the context in which the software is operating in, and the skills and preferences of the developers are different.

Software Engineering: 50 years, culminating in the SWEBOK

The Software Engineering Body of Knowledge (SWEBOK) defines 15 knowledge areas:

  • Software Requirements
  • Software Design
  • Software Construction
  • Software Testing
  • Software Maintenance
  • Software Configuration Management
  • Software Engineering Management
  • Software Engineering Process
  • Software Engineering Models and Methods
  • Software Quality
  • Software Engineering Professional Practice
  • Software Engineering Economics
  • Computing Foundations
  • Mathematical Foundations
  • Engineering Foundations

For business software, we know that the programming activities only cost 15% of the overall development time. For skilled people constructing the actual software is a relatively well-understood activity, but many very costly errors are made during the early phases when misunderstanding requirements or defining the architecture wrongly.

Software Engineering Areas

Software Engineering doesn’t only comprise the subdisciplines mentioned above, which cover the different activities within a software engineering project. Due to the various application areas, software engineering is also organized in domain-specific subdisciplines or partners with related domain-specific topics, so that a holistic, integrative development approach becomes feasible:

  • Automotive Software Engineering,
  • Information Systems,
  • Avionics Systems and Software Engineering,
  • Rail Software Engineering,
  • Embedded Software,
  • Cloud Software Engineering,
  • Data Science and Engineering,
  • Medical Software Engineering,
  • Government Software Engineering
  • Gaming Software Engineering
  • and now also Research Software Engineering.

Other domains, like Quantum Software Engineering, are also emerging.

Research Software

Since 2010 communities have been created, conferences have been organized, and people exchange their findings on how to overcome the challenges of research software development, i.e., how to overcome the research software crisis.

What differentiates the challenges of RSE from other forms of SE?

Kinds of Research Software

First, we recognize that there is no uniform kind of research software. Instead research software mainly falls into one of the following categories (and sometimes combinations):

  • Embedded control software for complex physical or chemical experiments, including many forms of sensor-based data collections
  • Simulation of physical, chemical, social, or biological processes in geometrically distributed spaces
  • Data processing and aggregation
  • Symbolic manipulation systems, such as computer algebra systems or theorem provers
  • Demonstrators and prototypes of various forms and with a large variety of goals

While the embedded control software also has its challenges, the main challenges seem to be the simulation and the data processing codes that are so heavily needed by researchers.

But software consists of parts from different technical domains. While obviously, the mathematical research part receives the main focus in practically useful programs, this often amounts to only 20-30% of the code (estimated, personal insights to some projects, but not thoroughly validated). The much larger rest has to deal with:

  • App infrastructure
  • User interactions
  • Visualization
  • Storage and transaction management
  • Communication between computing and storage nodes
  • Interacting with neighboring systems
  • Providing web services
  • Rights and roles management for access and prevention of undetected changes
  • Technical monitoring
  • Testing
  • Installation, deployment, orchestration

And this is very similar to many other kinds of software. The amount of technical code tends to grow more quickly than the functional research part.

Characteristics of Research Software

Several common characteristics of research software are:

  • It is used for research in the original research domain and thus originally only a by-product, and also treated as such.
  • Focus is the publication of the results.
  • Requirements of the software under development are initially unclear and the software development process is heavily intertwined with the research and innovation process, both on the scientific domain and its mapping into software.
  • Developed by researchers of various domains, but mainly not by computer scientists (nor software engineers)
    • An often seen scenario: The complete software is developed by a single Ph.D. candidate and erodes after the Ph.D. is done.
    • A similar scenario: the overall software is complex and has been developed over a long period of time by many developers, but the newly added packages have also only a single Ph.D. author and erode after he/she leaves.
  • Reuse is difficult, because the software is not designed for reuse.
  • Applicability of software is limited, because it has been designed for a single use case or a small set of use cases and extensibility and flexibility wasn’t designed into it.
  • Rewriting of software is done intensively.
    • Within a Ph.D. project, when the desired outomes (i.e. requirements) change, rewriting is applied.
    • If a new researcher takes the existing software, then a long lasting rewriting takes place to accomodate the new developers preferences. (“Don’t trust foreign code”)
  • Goals of the research institution, respectively their professorial leaders, significantly differ from the goals of the developing researchers, namely: long-lasting, sustainable software programs that can be used as infrastructure for research vs. obtaining quickly publishable insights.

However, the world is changing, the demand for reproducibility of the results enforces also publication of the data and also of the underlying software in a permanently and executablly available form. E.g. Zenodo stores software permanently, but doesn’t ensure executability (yet). Code Ocean, however, already does.

Research Software Engineering

As a consequence of the sustainability and reproduceability requirements on research software, the software turns from a by-product into a long-living, sustainable asset, if not into a core research infrastructure, where demands for quality of the code heavily increase. As a consequence of the software crisis in other development domains software engineering has discussed these and other typical quality attributes, namely understandability, documentation, reuse, or ability to evolve already a while ago.

It is therefore time to put the focus on the question, how to transfer the body of software engineering knowledge to at least the sustainable part of the research software. We remember:

Research Software Engineering is the use of Software Engineering practices in research applications.”

But which practices are actually useful in RSE? Which can be ignored? Which are omitted or forgotten, but would be useful?

We observe:

Focus: Processing Efficiency
  • SE: The focus of many SE techniques is the efficiency of the developers during the development project, because complexity of software functions is more demanding than the size of data and the computational effort.
  • RSE: Traditionally, High-Performance Computing (HPC, which covers a large part of RSE) focusses mainly on execution times. Simulations and various other forms of number crunching and data processing force RSE to put focus on efficiency of the programs.
  • RSE tomorrow: Complexity of programs steadily increases and probably both, the efficiency of the developers and the program will receive focus in the future.
Reuse
  • SE: Traditionally focuses on the reuse and provides a large set of mechanisms to increase the reusability of software. This starts with language constructs, such as inheritance, import of other modules/classes, explicit definition of interfaces that allow (but not enforce(!)) reuse. This continues on the methodological level, where development processes for frameworks, for features in software product lines, for components, microservices, independently deployable and versioned subsystems, etc. are established. Architectural and design patterns are of great help.
    • Software design and architecture must anticipate reuse.
  • RSE: To some extent uses these techniques, but too often these high-level techniques are not applied. That is natural, as the focus is not on reuse and explicit architectural design activity is not in place. In order to increase reusability rewriting code is fostered.
Modularity
  • SE: Modularity is a core technique to assist in fostering reusability. It is applied on various levels, starting with the building of classes, and the design and architecture of components, subsystems, etc. Modularity starts with the requirements, but enforces much focus during architectural design, affects the organization of distributed development and modular quality assurance. Achieving modularity is a professional skill that has to be learned and trained.
  • RSE: Modularity is applied mainly in class design.
Modeling
  • SE: Explicit use of software models using UML (or SysML). Domain specific languages (DSL) are used in various domains with specific characteristics, as well as for configuration and increase of modular use of components. These DSLs include e.g. building information modeling (BIM), control automata, etc.
  • RSE: Sometimes applied too. The core form of models are the mathematical constraints between physical quantities, timing and geometric topics. Many of them are differntial equations.
  • RSE tomorrow: The mathematical constraints, typically defined in scientific papers, and their software implementation need to have more focus. Especially ensuring the consistency between efficient implementations and the mathematical formulae (seen as requirements), tracing changes, explicitly specifying limitations need better assistance by RSE tools. This may include generators, more domain-specific languages, but also more math compilers. But also the software itself will need design techniques and the variants of UML-like techniques need to be integrated.
Automation With Smart Tools
  • SE: Continuous integration is of great help for developers to get immediate feedback and for project managers to keep track and an overview of the project progress. Tools that help refactoring, detect malicious or dead code, security vulnerabilities, architectural deficiencies, etc.
  • RSE: These techniques are applied e.g. using git, github, or gitlab capabilities.
  • RSE tomorrow: Assistance seems to be helpful for purposeful use. RSE specific solutions could help more.
Versioning
  • SE & RSE: The git version control systems with its openly accessible tools such as github as well as the locally deployable gitlab tooling infrastructures are generally applicable in SE. They provide helpful assistance, e.g. a ticket system, continuous integration, versioning, branching, variant management, etc. Skillfully applied they are improving the development process intensively.
  • SE & RSE tomorrow: Currently many tools are developed around git to improve their assistance even further. E.g. detecting security breaches, automatic deployment, artifact management, and documentation generation help. Increasingly, projects, companies and sometimes also specific communities are creating their domain-specific tools, which know specificities about the domain and as such can help developing and maintaining the code even better. The Linux open source project with its kernel, master and lieutenant system may act as good blueprint to build a chain of trust and quality.
Long-Lasting / Sustainable Software
  • RSE: It is true, that in RSE some larger (but by far not all) pieces of programs should be long-lasting and sustainable, to become reusable for various research questions to be addressed based on these programs.
  • SE: This is true for other software as well. Billions have been invested in developing software, e.g., in insurances, banks, government, and back-offices of companies and those assets must be usable as long as possible.
  • SE & RSE: Same problems. Even though the underlying development goals are different, the methods to achieve sustainability for software are potentially similar. This does not only include the software, but also it’s accessibility, for example in form of documentation and the people that have knowledge about the software in its complexity. Open source alone is not enough to ensure sustainability.
Reproducability
  • SE: This challenge occurs for example when legal aspects come into play. Banks and insurances have to provide access to the software, e.g., to tax authorities even 10 years after the software was taken out of operation in hot standby and be able to reconstruct the software service for 20 years while the underlying computing infrastructure involves rapidly. And the correctnes of the software has to be ensured all the time.
  • RSE: Same problem ouccurs, even though the reason is different and the timing is not constrained by legislation, but by scientific standards.
Framework Development Strategies
  • SE: SE has created various forms of modularity and reusability, including the concept of frameworks with hot spots, various forms of configuration techniques and tooling. Frameworks differ from pure libraries, because they are tightly integrated, establish the control flow and allow developers to plug in individual adaptations of functionality.
  • RSE: Mainly has monolithic codes, which are openly accessible, but difficult to understand. Documentation is sometimes available, often as reference to (static, unchangeable) published papers. Libraries of core functions are established.
  • RSE tomorrow: The framework and the product line concepts must be applied more often to achieve the benefits stated above.
Testing
  • SE: There is a tremendous amount of theory and practical advice available, allowing to develop tests for various purposes: from small unit tests, through component and integration tests, up to end-to-end user tests covering requirements, models, code, input data spaces in appropriate form. Testing frameworks allow to easily automate these tests and thus repeat them on each evolutionary step.
  • RSE: doesn’t do much testing. Test automation is widely ignored. To some extent this is because manual tests have been sufficient in a setting with reduced execution variability: sometimes there is only one data set and testing is equal to execution and getting the result immediately. So far plausibility checks replace testing.
  • RSE tomorrow: RSE will need more automatically repeatable tests. It may be that the form of end-to-end tests is rather specific, because for example in simulations the timing aspect plays a role. Testing massively distributed software also has its own challenges. Adequate tooling is needed, because otherwise evolution and reproducability won’t happen.
Development Methods
  • SE: Various forms of development methods have been established. These include V-model, Rational Unified Process, Extreme Programming, Scrum, and many other agile or more document oriented development processes.
  • RSE: Hacking still applied too often. Especially the requirements engineering process very much differs from traditional SE. In many Ph.D. projects the candidate is developer and his/her own single customer.
  • RSE tomorrow: Dependent on the form of software, the number of people, their skills, the expected quality and outcomes, timelines, etc. several different forms of methods will be needed. The core need will be that someone in the project knows about the activities and their interactions in a development projects, select the right process and ensures that the participants live it in the project.
    • Exploration of new research ideas plays a major role in RSE and must be well integrated into an RSE method. Again: Explicit innovation enabling techniques are already well established in core SE and should be applicable in adapted form.
Consulting
  • SE: In the industrial practice of SE, consulting has become a major driving force. There is a large set of companies that employs consultants of various specific skills, which are working in sometimes large projects to develop various forms of software.
    • Skills may be technical, for example dealing with specific software stacks, or they may be applicable for specific activities, such as architectural design, requirements engineering, testing, database connection, installing software as a service, or tool building and adaptation.
    • Consulting may also consist of various different activities, such as providing templates, giving courses, training on the job, code reviews, tool and software stack assistance, or other kinds of trouble-shooting and problem-solving.
    • Even co-development is doable, when domain-specific know how and thus pi-shaped skills (software engineering + domain knowledge) are present.
  • RSE: First research organizations are starting to apply in-house consulting.
  • RSE tomorrow: In-house consulting is a possibility and maybe also a necessity to temporarily bring experienced developers with specific skills to the application projects that need these skills.
    • As a consequence: groups of skilled software engineers (including developers) will be established at least at larger research organizations, to assist with various stages of development and potentially also maintenance and curation in the later phases of sustainable software.

RSE Research

For a precise understanding we’ll put this into parentheses: ((research software) engineering) research) is a foundational field of research that focuses on the engineering techniques, tools, methods, frameworks, etc. to develop research software. It will take inspiration from (generic) software engineering research and find solutions specific for the other research domains.

RSE research has to tackle a number of questions, that do not only address the software as a product, but also assistance for the domain-specific researcher as a human being with limited time and always optimizable skills for software development:

  • How does RSE differ from SE and what specific activities have to be addressed by different methods and tools?

  • How can classic SE approaches, like V-Model, RUP, Scrum, or Xtreme Programming be adapted to RSE needs?

  • Or is an entirely new development process needed?

  • How to run empirical validation in such a context? (At least researchers could understand that empirical validation is needed, and would hopefully be assisted?)

  • How to optimize research software requirements elicitation. This is typically deeply connected with the domain-specific research process itself, which is why traditional requirements engineering techniques don’t fit.

  • How to define robust architectures that are extensible, maintainable and allow a reusability for research software, where the forms of extensions, and connections to neighboring functionalities are initially very unpredictable.

  • How to gain precise understanding of optimal forms of reuse: from (1) simple copy-paste, through (2) branching and (3) black-box reuse up to (4) pre-deployed service reuse? And this is needed across heterogenous languages, software-stacks, and hardware infrastructures and must be seen in the context of long-term evolution.

  • How to document and model the desired software with the purpose of (a) effective development, (b) possible code generation, (c) sustainable reproducibility, and (d) comprehension in the sense of FAIR research data management? This is a multidimensional problem, because software needs to evolve, because of (1) bugfixing and (2) evolution of the software stack it is embedded in, such as underlying external data sets and their service APIs, security patches of the operating system or libraries, etc. Docker is only of local help. Software also comes in technical and functional variants and is a lively object that very much differs from passive data. Currently research data management initiatives, such as the German FDMI, are only starting to adress this.

  • How to manage long-term evolution with fluctuating developer groups?

  • How to manage, govern and analyse variant-aware and variant-rich research software in long-term projects?

  • What tools are needed to easily automate various tedious activities that researches would like to avoid?

  • What are useful domain-specific languages (DSLs) for RSE, either to raise developer efficiency or to decrease quality issues?

  • How can DSLs bridge the published mathematical, physical, biological, medical, etc. laws and findings on the one hand and the executable code on the other hand?

  • What is appropriate meta-information about the software artefacts, their states and forms of review, certifications etc. to build trust, enable reusability, and document quality?

  • How can software analysis tools contribute, e.g., checking consistency between mathematical models in published papers vs. the implemented code, which might even leed to a change in our publication culture?

  • How can tools for dead code, inefficient code and other code analysis techniques effectively help the researchers to improve quality?

  • How to test and what to test in which intensity? Or more general: What is high quality assurance for research software in order to achieve high quality research results? We need adequate and efficient testing, reasoning, reviewing, or certification procedures.

  • How can an intelligent research copilot help to address coding issues?

  • How to assess and optimize development skills for researchers?

Many of those questions are not specific to RSE, but apply to classic SE as well, but some of the questions are very general, the answers must typically be very domain-specific to gain improvements for the domain-specific challenges. Classic SE has many (and still optimizable) answers to these questions, but if it is unclear, how they transfer to RSE.

Some Research Topic Examples

The following is a very incomplete set of lists, just to give an example of what is needed:

  • Consistency between academic paper and code, through techniques like the following:
    • code generation using a latex formula as DSL, or very similar
    • code generation from new DSLs that generates to code and latex
    • derive test cases from the scientific paper (in case no code can be generated)
    • verification of the relationship of the code to the mathematical model
  • code analysis as retrospective: not only with the usual code analysis results such as deadlock freedom, but to answer the question of whether the code covers the right scientific models (up to numerical analysis of rounding problems)

  • “Research ChatGPT” as the co-pilot of the scientists
    • Both for creating code and for creating tests
    • And if models are relevant: also for finding scientific relations in the form of mathematical models
  • Energetic analysis: CO2 footprint of the software

  • How do we combine mathematical models (essentially differential equations) with the theory of digital software models (mainly automata, Petri nets, temporal logic, etc. but also structural models such as class diagrams).
    • There is still not a reasonable solution today – probably also because there is no optimal solution, but dependent on the scale of the processes these different kinds of models interact differently.
  • Domain Specific Languages (DSL) for research (like we do in MontiCore**): Sometimes it may be worthwile to provide researchers with a conceptually reduced, problem-adapted language and allow researchers to model in their own vocabulary instead of writing code.

The bottom line is: the connections between

  • scientific publication,
  • domain theory and its models,
  • code and
  • tests

are to be addressed even more clearly. Ideally these are treated with automated solutions, which obviously also includes a variety of tools specifically dedicated to research software.

Recommendations and Concluding Observations

The development of research software faces the same challenges as developing any other kind of software. The software engineering body of knowledge addresses these challenges and is of considerable help. Unfortunately, there is no silver bullet and software engineers are not firefighters who can easily rescue software that has degraded over several years. However, software engineering sometimes provides witchcraft-like techniques, which help best, when applied early.

Research software engineering correctly pushes these techniques into development projects for researchers, but domain-specific adaptations are necessary.

Software development tools for the automation of various activities and wizard-like assistance have enormously increased productivity and simplified the hurdles for newcomers to create pieces of software. Moreover, these tools nowadays enable non-experts to develop significant pieces of software and leverage the knowledge of core software structures, such as persistent storage, communication, compilation, computation orchestration, etc.

Computer science can be very proud of having understood the core technologies, turning scientific knowledge into automatically executable algorithms and frameworks, and embedding these into tools usable by non-experts without having a deep understanding of the internal mechanisms of these tools. This also includes integrated development environments (IDEs) with highly assistive editors, Low Code approaches using scientific or other explicit modeling techniques, automated continuous integration, code analysis, etc.

Computer science can be proud of enabling non-experts to write expert-like software. No one might believe, that building even such a simple thing as a one-family house can be done by non-architects, but in software development, we have achieved exactly this. But the tooling infrastructure is still under intense development, and much more can be done.

  • Recommendation 1: It is necessary to build better domain-specific tooling to address the domain-specific challenges of research software. Wizard-like smart tools help development amateurs and, to some extent, prevent them from having to focus too much on SE skills themselves.

Smart, possibly domain-specific tooling is part of the research infrastructure and needs high quality. It cannot only be a research prototype and thus should be professionally planned, engineered, and managed.

However, there is complexity in research software that doesn’t go away. If the software shall not be a throw-away software, there are additional methodical topics to address that currently cannot be completely automated.

  • Recommendation 2: Researchers that create software for a sustainable, long-lasting infrastructure need to be trained in software engineering skills, which drastically differ from mere programming skills.

Not all software needs to sustain; sometimes simple throw-away experiments are ok, but developers should be aware of the expected outcomes and life expectancies of these outcomes. Their development processes should contain the appropriate measures and mechanisms to achieve their goals.

Researchers need to be aware that software engineering is not only about getting the code right but also involves architectural, design, quality assurance, and management soft-skills to be adopted and lived during a development process.

Because we know that these skills are not easy to be adopted, there is a third relevant possibility:

  • Recommendation 3: Have one or more software engineers be part of your project to get the software and the technical architecture right, adopt the appropriate tools and quality mechanisms, etc.

This can, for example, be done by permanent employment but also by consulting offers, which seems to be an increasingly feasible approach. Various British institutes and Germany’s Helmholtz seem to have started this approach by installing research software engineering consultants with primary software engineering skills and some secondary knowledge about the research domain. This seems to be promising because software engineers are trained to collaborate, and software engineering methods very well assist collaborative approaches.

However, software engineering is a holistic approach, and many strategic decisions have to be taken, therefore:

  • Recommendation 4: Establishing RSE principles is a core topic for the management, i.e., the professors and the research institutions, that needs to be addressed adequately.

Finally, we have learned that there are generic software engineering techniques that can be applied in many domains, but due to the domain-specific differences in characteristics, it is also useful to adapt, enhance and possibly create domain-specific techniques, tools, methods, frameworks, etc. This brings us to:

  • Recommendation 5: Establish research software engineering research as a research field over RSE.

RSE research will probably keep us busy for a number of years as a foundational field of research, which obviously is to be executed by SE researchers and not so much by the domain researchers themselves.

Thank you Marco Konersmann, Florian Rademacher and Lucas Wollenhaupt for commenting a draft.