Ensuring software timing behavior in critical multicore-based embedded systems

Article By : Francisco J. Cazorla, Enrico Mezzetti, Ian Broster, Juan Valverde, Stefania Botta, Jaume Abella, Christos Evripidou, Javier Mora de Sambricio

Getting somewhere safely depends on more than just good brakes, working taillights, and someone with excellent reflexes behind the wheel.

Getting somewhere safely depends on more than just good brakes, working taillights, and someone with excellent reflexes behind the wheel. Increasingly, the components that keep your car on the road and your plane in the air are not only human, or even just mechanical. They are sophisticated pieces of embedded software running on complex heterogeneous multicore processors, controlling everything from flight management system to power steering, and executing to strict timing deadlines measured in microseconds.

Herein lies the challenge. The timing behavior of software in a multicore system is affected not only by the software running on it and its inputs, but also by the other software running on other cores.

Critical embedded systems require an immense effort and investment (millions of euros/dollars and years of engineering effort) to be developed. Safety has to be at the heart of the architecture and design, right from the earliest stages of the software development process. In particular, systems designers must understand the timing behavior of their software, to ensure it can execute within safe timeframes.

Solving the multicore timing analysis (MTA) puzzle

Although the awesome computing capacity of a multicore processor should (in theory) make embedded systems more powerful and efficient, software executing on one core can slow down execution of software running on the other cores. In this situation, software can take longer to execute due to interference caused by contention for shared resources such as buses, memory, caches, devices, FPGAs and GPUs that are shared with tasks running on other cores.

How do you quantify the effects of this interference? How do you analyze, test, and provide concrete evidence that your safety-critical software, when running on a multicore platform, can always execute within its timing deadlines?

Experts at the Barcelona Supercomputing Center (BSC), Rapita Systems Ltd (RPT), Raytheon Technologies (RTRC), and Marelli Europe (MAR) have been investigating answers to these questions for many years. BSC and Rapita have been developing a solution that will soon be rolled out across the aerospace and automotive industries. Specialized tooling and automation, combined with a requirements-based, safety-focused methodology were the keys to solving the puzzle.

This work has formed the basis of the MASTECS project, a multi-disciplinary research and development project funded by the European Commission and launched in December 2019. The MASTECS project will mature the technologies and support their use for certification of avionics and automotive systems. A key part of the MASTECS project is to provide a demonstration of the approach in two industries through case studies deployed by RTRC and MAR.

State of the art tools

Commercially available tools to support timing analysis are effective for simple (single-core) electronics, but do not scale to meet novel multicore-specific certification requirements and recommendations.

  • Static timing analysis solutions face a complexity wall and can neither effectively model the increasingly complex hardware nor efficiently deal with the structural and syntactical characteristics of exceptionally complex software functionalities.
  • Measurement-based solutions have reached a good level of penetration in the single-core analysis market (Rapita Systems’ RVS toolset being amongst the most successful ones). However, such tools are still unable to fully sustain the challenges brought by the introduction of multicores. They typically focus on measurement scenarios as determined by consolidated functional testing strategies, but lack a hardware expertise based methodology that helps deriving trustworthy timing bounds for tasks running in multicore with the necessary supportive evidence and an adequate level of traceability.

To our knowledge, no commercial tool is available in the market, other than the one being matured in MASTECS, that is capable of analyzing the timing of software on multicore platforms, with strong focus on applicable safety standards and emerging certification requirements.

Interference analysis and control in action

The key to understanding interference is a structured test methodology, using hardware and software experts to produce evidence about multicore timing behavior. A specialized technology from BSC (known as multicore micro-benchmark technology or MμBT, commercialized by Rapita as RapiDaemons) lets system designers analyze and quantify the effects of interference in a multicore-based application by creating additional interference scenarios to stress-test different parts of the multicore processor.

Micro-benchmarks, at the heart of MuBT, are well-crafted pieces of code that operate at the lowest interface between hardware and software to stress a specific shared resource. Micro-benchmarks expose the impact of interference channels on software timing. To do so, micro-benchmarks can be deployed to cause a configurable and quantifiable pressure on a specific application. Micro-benchmarks are specifically designed to exhibit a single, clearly defined behavior with anticipated effect on a specific hardware resource, while preventing as much as possible to generate contention on other interference channels. Micro-benchmark key features include the following:

  1. They put quantifiable pressure on specific shared resource.
  2. Their behavior can be verified via event monitors.
  3. They capture specific timing-related requirements, e.g., whether the mitigation actions you put in place to master contention are effective.

Figure 1: Use of micro-benchmarks in interference analysis. (Source: Authors)

A wide range of micro-benchmarks have been developed to have specific roles, including matching a desired level of interference, maximizing interference on the resource, or simply being very sensitive to contention (‘victims’).

In analyzing the effects of interference, the use of MμBT is supported with a task contention model (TCM) that provides early estimates of the contention delay tasks can suffer. Software automation and testing tools RapiTest and RapiTime developed by Rapita are used to write tests and run them on the embedded target.

Design methodology

By following a seven-step test design process along the standard software ‘V’ development process (Figure 2), engineers can more fully understand the impact of interference.

  1. Multicore processor critical configuration setting, interference channel and event monitor analysis. Hardware experts help identify critical configuration settings to set the framework in which interference channels are also identified along with mitigation measures. The identification of hardware event monitors is also instrumental to provide a means of verification for all following steps.
  2. Identify timing requirements. Help the end user to identify their specific needs, timing requirements, risks and safety issues for the system. For example, verify the performance of any hardware isolation approach to minimize interference.
  3. Test case design. Develop specific test cases (description of a test) to verify the set of hypotheses supporting the user requirements, including defining the MμBT items that will be required to provide evidence in the interference channel analysis. This involves execution in isolation (no interference), execution against micro-benchmarks to assess application’s execution time and hardware sensitivity to interference under different quantifiable stress scenarios.
  4. Implementation of test procedures. Currently, a manual process to be automated in MASTECS, this step builds the test procedures consisting of a test framework, micro-benchmarks and measurement probes to record/trace the results.
  5. Evidence gathering (testing). The test procedures are executed on the platform to gather test evidence. Currently involving some manual work, this will be automated in MASTECS using the RapiTest automation framework to execute those tests and link them back to verification requirements.
  6. Results Analysis. A review of the test results by technical experts to check how the test results verify (or otherwise) the verification requirements. For example, Figure 3 shows a screenshot of RapiTime on the execution times reported for different functions of a program.
  7. Validate results and generate documentation. Final review of requirements, generation of documentation and qualification results to support the safety argument of the system. The customer can use the full set of reports and analysis artefacts directly for the certification of software running on multicore.

Figure 2: MTA steps in the V-model software development process. (Source: Authors)

Hardware expertise and the timing analysis process

Injecting hardware (multicore) expertise is a key trait in the proposed MTA approach for its success on modern complex multicores.  During early software development stages:

  1. Hardware experts identify multicore configurations (critical configuration settings in avionics jargon) as they play a key role in determining the software functional and timing behavior, and largely affect the amount of contention tasks generate each other. As an illustrative example, current processors implement isolation and segregation mechanisms that, if properly deployed, can heavily reduce contention.
  2. Multicore experts play a key role in identifying those resources in which task contention can arise (these are referred to as interference channels in avionics). The ability of hardware experts to navigate multi-thousand-page processor technical reference manuals and formulate the appropriate questions on the potential missing information on the manuals to the chip vendors is fundamental to drive an appropriate MTA process.
  3. Once interference channels are identified, hardware experts identify those event monitors that can be used to track the activity which tasks generate on those interference channels as a proxy metric to bound the contention that tasks can suffer. The correctness of those event monitors must also be verified [2] for which a specialized set of micro-benchmarks has been designed.
  4. Finally, hardware experts work hand in hand with timing analysis experts to derive, from user requirements, high-level and low-level requirements and specific tests to validate the hypotheses supporting the user requirements. Each test instantiates one or several micro-benchmark programs designed by hardware experts to put the desired level of load on the target (set of) interference channel(s).

During late design stages:

  1. Hardware experts contribute with the analysis of test results to assess whether they confirm or reject hypotheses.
  2. Hardware experts also contribute to establishing new hypotheses and the corresponding tests in case they are needed based on the results obtained in the previous step.

Figure 3: Analyzing results (RapiTime). (Source: Authors)

The bigger picture

The 7-step test design process is only one part of a wider multicore verification methodology shown earlier in Figure 2. This methodology, which will continue to be matured as part of the MASTECS project, is designed to achieve full traceability, from comprehensive evidence and results back to the corresponding requirements and designs.  The methodology is designed to meet the objectives defined in CAST-32A, the key guidance document issued by aerospace certification authorities. It is also specifically aligned with ISO 26262, the safety standard for the automotive sector, which advocates freedom from interference.

CAST-32A was published by the Certification Authorities Software Team (CAST) in 2016, and identifies factors that impact the safety, performance and integrity of airborne software systems executing on multicore processors. If you want to use multicore hardware in an avionics system, this is the go-to document. It provides objectives intended to guide the production of safe multicore avionics systems including objectives related to identifying and bounding the impact of interference channels. View the CAST-32A position paper here. EASA and FAA are working on an adaptation of the multicore generic CRI into a common AMC/AC material (AMC 20-193). It is expected to be published “later this year”[3].

Expertise cannot be automated

Interference effects are complex. To unravel their mysteries, you need experts who understand both the components of the multicore architecture, and the scheduling and resource allocation systems in the software. Collaboration between hardware and software experts will be a central feature of the MASTECS project as it continues into the future. But while collaboration leads to great strides in software tooling and automation, it’s important to remember that you can’t automate every step of a validation process – especially not when multicore timing analysis is involved.

You need experienced engineers who know the systems in detail. For example, during the early stages, multicore experts can identify the processor configurations (also known as hardware critical configuration settings) that determine the software’s functional and timing behavior, as well as the potential interference channels. When it comes to analyzing test results, nothing beats the input of an experienced human expert to revisit and evaluate the original assumptions made about the platform, and use their knowledge to feed into a new testing cycle.

— Dr Francisco J. Cazorla (BSC) is the leader of the Computer Architecture / CAOS group in the BSC and the technical coordinator of MASTECS.
— Dr. Enrico Mezzetti (BSC) is a senior researcher in the Computer Architecture / CAOS group.
— Dr. Ian Broster (RPT) is a founder and the General Manager of Rapita Systems Ltd.
— Dr. Juan Valverde (RTRC) is a Staff Research Scientist at United Technologies Research Centre Ireland Ltd.
— Stefania Botta (MAR) has a degree in Computer Science, inside Marelli – Powertrain B.U. she is part of “Software Tools and Methodologies” team.
— Dr. Jaume Abella (BSC) is a senior researcher at BSC.
— Dr. Christos Evripidou (RPT) is the Technical Lead of Rapita Systems’ UK Multicore Timing Analysis team.
— Dr. Javier Mora de Sambricio (RTRC) is a Senior Research Scientist at United Technologies Research Centre Ireland Ltd.

Leave a comment