Software-defined automobiles: An efficient platform for essential parallelization

Article By : Mathias Fritzson

Are we at a critical juncture in how the industry moves forward with software-defined automobiles? As automotive functionality increases and certification...

Are we at a critical juncture in how the industry moves forward with software-defined automobiles? As automotive functionality increases and certification and compliance become increasingly more difficult, one approach is to continue adding multiple CPUs to a vehicle’s embedded system for additional capability and to rely on the merits of more powerful CPUs and multicore processing.

On the other hand, some would say it’s time to condense multiple CPUs into a single, onboard supercomputer. Are the advances of a single computer allowing us to remove the multiplicity of all those engine control units (ECUs) and consolidate into a single computer?

It’s safe to say today’s vehicles are a multiplicity of computers.

Apple co-founder Steve Jobs once said, “The way the processor industry is going, is to add more and more cores, but nobody knows how to program those things. I mean two, yeah. Four, not really. Eight, forget it.” By structuring the problem and understanding it, we can make good use of multicore systems. That’s not a problem. The problem is to use multicores for a single problem, a single process. And that’s what Steve Jobs was alluding to.

So the answer is yes to multicore processing – if we’re smart about it.

Today there is intense demand for more multicore use cases. We now have the Autonomous Connected Electrical Shared (ACES) movement driving more complex electrical/electronic (E/E) architectures. With ACES, we see further electrification of features and consolidation of ECUs. Even with a high-level understanding of adding multiple cores and more computational power such as high CPU frequency, it still isn’t any easier for the engineer to implement such functionality. As Steve Jobs said, you can’t just throw more CPUs into the mix. You need to have structure.

The problem to be addressed by the automotive industry is parallel execution of a parallel application. That is, how many applications can be localized to the same ECU so independent vehicle functions can be executed in parallel?

The era of software-defined vehicles

What is the software-defined vehicle and what does it contain? We can talk about the E/E-enabled features that were once purely mechanical and are now more mechatronic. All of these functions depend on localization and communication along with the required control algorithm.

When we say localization we’re talking about sensors and actuators in their respective physical locations. Communication is about multiplexed and prioritized shared resources such as a CAN or Ethernet. The communication from the sensors is sent to an ECU, which executes the control algorithm and then sends the commands to the actuators. By combining the sensors and actuators with calculation capacity, engineers can increase the feature set of a software-defined vehicle without additional hardware. Examples of parallelized functions include power steering, electric windows, and power-assisted braking – all of which can all be done concurrently.

The advantages of appropriate partitioning
What is the value of partitioning an application? There are two main advantages. One is performance, which is normally based on parallelization of calculation packages and identifying minimal datasets to communicate and simultaneously perform the calculation jobs efficiently. The second advantage is separation for functional safety or cybersecurity reasons. This type of separation is done without affecting, influencing, or gaining access to other parts of the software.

It’s important to understand the impact that partitioning has on the software execution. Parallel processes can be achieved with time slicing on a single core, but that’s not the focus of this article. Running multiple partitioned applications in parallel on different cores opens up for less blocking of CPU time. Other resources that cause hidden latency, like memory access (bus wait states), or that might cause interference, like MCU peripherals and communication bus access, argue for working datasets in separated/localized memory.

AUTOSAR – the ECU platform of choice
For modern ECU development, the AUTOSAR methodology is the obvious choice. Automotive OEMs and their tier one partners have been using it for over a decade and it’s been successfully deployed in all types of vehicles today. The AUTOSAR methodology is part of systematic vehicle development (Figure 1).

click for larger image

Figure 1: The AUTOSAR methodology is part of the systematic vehicle development. (Source: Siemens Digital Industries Software.)

The system development tools connect in chains from high-level requirements to implementation details to verification rules for system consistency. In this way, users can generate and produce more refined views of the system as it moves closer to final implementation. The AUTOSAR-defined data exchange formats, ECU Extract and ECU Diagnostic Extract connect the ECU system design to the ECU design.

The AUTOSAR exchange formats carry the refined information about the ECU communication as well as the application, referred to as the software component (SWC) with execution needs and data exchange details. The AUTOSAR methodology describes the process of transforming the system definition into an ECU configuration. The generation process takes the provided input, combines it with the resources of the configured ECU hardware, and generates a customized configuration that supplies the needs of the software components. The now configured embedded software platform (called basic software, or BSW) underpins the software components, providing an operating system (OS) and the necessary resource management, enabling the efficient deployment of the ECU application.

The platform described thus far is the AUTOSAR classic platform, established nearly 15 years ago. This deeply embedded system is based on the OSEK standardized real-time operating kernel and can effectively run control applications throughout the entire vehicle. The AUTOSAR classic platform is available to use with a higher ASIL and targets deeply embedded systems, the focus of this article.

AUTOSAR adaptive for the modern, connected automobile

The AUTOSAR Adaptive Platform was introduced to meet the growing demands around ACES. It’s defined for infotainment systems, ADAS systems, and other high-demand computational applications. Implementing the AUTOSAR methodology (either the classic or adaptive platforms) is inherently interoperable to ease vehicle system development and to ensure consistency.

Increasing multi-partitioning support over the years

As the automotive industry utilizes the advances in MCU design with multicore for embedded applications, AUTOSAR has added mechanisms to distribute computational load over multiple cores in both heterogeneous and homogeneous environments.

The AUTOSAR classic platform has evolved its multi-partitioning support and can support multiprocessing beginning with AUTOSAR 4.0. With this release, SWCs could be distributed over multicores supported by mainly the operating system and the runtime environment (RTE) supporting inter-core communications. With AUTOSAR 4.2, parallelization patterns were introduced to enable BSW components to be distributed over different cores and mode management support to synchronize core startup and shutdown. MCAL multicore distribution in AUTOSAR 4.4 enabled effective access to hardware resources at the core. AUTOSAR 4.5 included a BSW distribution over multiple cores, focusing primarily on communication modules and their features.

AUTOSAR capability highlights

Application distribution

Application distribution (Figure 2) was initially supported in AUTOSAR 4.0 with gradual improvements over the years. This part is proven in production and in use on roads today. Using the inter-core communication specified by AUTOSAR allows users to create mixed criticality systems where SWCs of different criticality can be placed into different partitions or cores. The communication via RTE is supported by the OS that requires the communication between the SWCs being distributed on separate cores is adequately minimized as cross-core communication; if users send too many small messages, it will overload the system. Having the BSW in one partition is a roadblock, so this must be resolved.

Figure 2. Application distribution: This pattern offers limited cross-core communication but is required for ASIL mixed-criticality applications. (Source: Siemens Digital Industries Software.)

Core with sole access to a resource

In this scenario, the core has sole access to a resource (Figure 3). This can be a single core or it can be a BSW partition with exclusive access to the resource which then makes it similar to a single core use case. No additional synchronization is needed. Integrity of the communication and access to resources are secured.

Figure 3: Core with sole access to a resource: This is a single-core use case, no additional synchronization is required. (Source: Siemens Digital Industries Software.)

Virtualized access with a master satellite

Another pattern case is virtualized access with a master satellite implementation (Figure 4). Here, the BSW introduces a static broker providing a duplicated API to SWCs on each core. The API in the satellite might be reduced compared to the master and users would then have to revert back to the initial communication for these APIs over the RTE, which is still fully supported.

Figure 4: Virtualized access with a master satellite:Direct API access removes cross-core latency and cross-core synchronization is available in the background. (Source: Siemens Digital Industries Software.)

The direct access to an API on the local core/partition provides a fast return from the call and removes the cross-core/partition latency induced by a cross-core RTE representation. This is good for type control loops where users don’t want to lock up the application while waiting for a call to a remote core to return. The cross-core data synchronization between the BSW master and the satellite will happen in the background. The background synchronization introduces a latency in the data coherency between the master and satellites, so the frequency of the communication and also the defined dataset parts can be distributed for engineering.

Memory access with read only pattern

Finally, when users have software components, depending on the same data but running on different cores, they need to resolve synchronization needs between these two different components (Figure 5). By having a consumer that reads data from a publisher with single data access, engineers are assured of consistency. With a cross-core partition read users must be assured of no modifications to the data.

Figure 5: Memory access with read only pattern: The consumer reads data from the published memory. With the cross-partition read, no modification to the data is needed. (Source: Siemens Digital Industries Software.)

This is dependent on the hardware structures, so users need to be engineering their solution and identifying the size of the dataset used for communication. If the dataset is larger than what the hardware will ensure to be an atomic operation, users will need to introduce data protection with, for example, spin locks.

Putting it all together

How do we then apply this in an AUTOSAR system and in the development flow? This is where AUTOSAR and tooling provide obvious benefits. The ECU setup is applied, treated as a single-core workflow, and then imported to generate the configuration enabling the necessary services.

The user then defines the partitions distributed over the MCU cores. Based on the communication partitioning made at the application level, the SWCs are wisely allocated over the partitions. After assessing the configuration, the tooling will generate the base system to ensure that the resources are correctly accessed.

In practice, what improvements can applying BSW partitioning provide? In a customer example of a multicore MCU executing the BSW on a single core, with the SWCs distributed over several cores, the BSW MCU core load was around 98 percent. To mitigate overload, the BSW was partitioned and the modules moved to support the FlexRay communication bus to a core with available capacity. The SWCs with tight affinity for communication over the FlexRay bus were also moved to the same core.

Moving the FlexRay BSW modules and the related SWCs provided the necessary and correct partitioning. This made the BSW core load drop to 45 percent. The improved BSW and SWC distribution dropped the MCU combined processor load by two percent overall. So, the AUTOSAR architecture with the Siemens-specific enhancement provides support for core load balancing.


The auto industry is moving towards a centralized computational architecture for complex vehicle features required by customers and by new legislation. These architectures heavily rely on the latest multicore MCUs to realize the necessary computational power.

The requirements of software to efficiently use the available hardware resources is integral to the software-designed vehicle. AUTOSAR solutions such as Siemens’ offer a natural choice for the embedded system running on a multicore MCU for the deployment of power processes that require different dependencies to resources while maintaining focus on application and integration.

It’s important to adopt knowledgeable engineering practices for the activities required to distribute the applications of parallelizing resources and to allocate the shared resources correctly. Despite using effective and capable software for parallelization, users need to maintain their embedded systems engineering capabilities. To be successful, engineers must maintain efficient partitioning and smart development processes for their specific applications.

— Mathias Fritzson is the product manager at Siemens

Leave a comment