Hardware security modules unleash AUTOSAR

Article By : Tobias Jordan

A hardware security module future-proofs an ECU’s cryptography.

Increasingly complex software and in-vehicle connectivity require more and more cryptographic protection. This protection must also be implemented by classic real-time AUTOSAR systems. Hardware security modules (HSM) with suitable firmware future-proof your system’s cryptography, even when resources are scarce.

The degree of connectivity of ECUs in automobiles has been growing for years, with the control units being connected both to one another and to the outside world. However, as increasingly complex software is facing new requirements, the requirements for communication are increasing as well. From a security perspective, this means a change from isolated single systems to highly connected nodes. Accordingly, security and protection against external threats are gaining importance. Among other things, this protection can be achieved by increasing the use of strong cryptography – something that, at first glance, is not easy in classic real-time systems. Cryptography used to be necessary only in special operating modes such as ECU software updates in the repair shop.

Nowadays, it must be efficiently usable even during regular run-time: for example, to authenticate communication partners and communication contents and prevent interception. The appropriate implementation must allow real-time requirements. At first glance, this contradicts the computation time requirements of cryptographic methods. Hardware security modules (HSM) are used to address this problem. They allow cryptography to be computed on a separate processor. However, they need an optimized implementation to perform to their full potential.

Automotive real time
The requirements for in-vehicle real time generally result from a so-called chain of effects. It analyzes which hardware or software components are involved, and in which order, in processing the input pulse from a sensor to the desired output pulse of the relevant actuator. The longest acceptable run-time of the entire chain can be derived from the sensor and actuator’s physical conditions. It can then be broken down into allowed maximum run-times of the individual components. Depending on the use case, they range from 10 µs to 100 ms. If the allowed maximum run time is exceeded, this can impair convenience and cause, for example, noise. For safety-relevant functions that require real-time capabilities, this may even result in the risk of injury to life and limb.

In order to meet and implement the resulting real-time requirements for software, an AUTOSAR operating system divides the relevant software functions into sub-steps, so-called tasks. Tasks are activated by external events, i.e., marked as runnable and then processed based on priority. For most applications, a timer with a cyclic activation table is used to activate the tasks as the relevant control functions are to continuously process new signal generations. This results in a default time response that repeats itself in cycles of a few tens of milliseconds. Several steps are necessary to ensure that the real-time requirements are met. On the one hand, appropriate measures when implementing the signal processing steps must of course ensure that they can provide the required time response. On the other hand, the time-related task activation in the ECU must be capable of implementing the time requirements. To this end, the run-time behavior of all tasks must be analyzed to find appropriate prioritization and a chronological activation order that guarantees timeliness.

AUTOSAR basic software
ECU functions are no longer just involved in signal processing. The basic software that handles, among other things, communication and management constitutes a significant portion of these functions. This basic software, too, must be divided into tasks, analyzed in terms of its run-time requirements and considered in the overall system. In simplified terms, there are two different function sets: One part of the basic software, for example for communicating signal values, is necessary to enable on-time signal processing. This part must be given an appropriately high priority. The other part has significantly lower real-time requirements: This part must run regularly in the background; however, it may be briefly preempted by functions with harder real-time requirements. Accordingly, a common implementation pattern is to move these non-critical functions to a low-priority background task. In this context, the problem is the prioritization of the functions among one another: Generally, there are multiple background activities, none of which must fail over a longer period. The solution to this prioritization is round-robin scheduling where each of the low-priority functions computes for a short period, followed by the next function. In AUTOSAR terminology, each of the modules involved provides a so-called “MainFunction” with limited run-time. In the background task, these functions are called sequentially.

One example of such a classic “background function” are cryptographic algorithms. Since release 4.0, AUTOSAR contains specifications for cryptographic basic software – the appropriate specifications were revised and fine-tuned in later releases. In this context, the so-called Crypto Service Manager (CSM) provides cryptographic services to the application:

Figure 1  Security modules in an AUTOSAR 4.3 system

For example, it is used by Secure Onboard Communication (SecOC) that cryptographically authenticates the signal values in transmitted data packets. As the name already implies, CSM manages the cryptographic services. Their actual implementation is located in a manufacturer-specific crypto library. To support integration into an AUTOSAR system with real-time requirements, CSM processes requests asynchronously: Initially, they are only saved and then processed piece by piece in calls of the CSM MainFunction. To this end, the CSM MainFunction, for its part, calls MainFunctions of all underlying cryptographic primitives, each of which then computes a few steps.

Computation times of typical cryptographic primitives exceed those of signal processing functions by orders of magnitude. This poses a dilemma when dividing into MainFunction calls: What is the exact extent of allowed computing? Too many computation steps would render the cryptography useless by calling the overall system’s real-time properties into question. Too few steps would delay the computing of longer cryptographic operations to such an extent that the benefits would be limited.

In addition, another effect has to be considered: It is the specific MainFunction’s responsibility to manage its internal computation state such that it allows step-by-step execution of the computation. This state management, for its part, implies an overhead. The less a single call actually computes, the greater the overhead.

It is up to the software supplier to find a standard solution to this problem. In classic scenarios, cryptography at run-time is, for example, used to authenticate smaller data blocks with symmetric encryption methods. One solution may be to compute one symmetric block for each MainFunction call. However, when using more complex methods, it is difficult to find a sensible compromise.

Examples include authenticating/encrypting larger amounts of data or generating asymmetric signatures. For example, let us assume that computing once per millisecond for 100 µs is acceptable – a very optimistic assumption that would be difficult to maintain in most real-time scenarios.

Compared to the pure computation time, such a division means a tenfold increase in the time until a result is available. For cryptographic functions whose pure computation times already cause difficulties, such a division significantly limits the practical use.

Hardware security modules
Clearly, the conflicting requirements in terms of real-time capability and overhead for cryptographic methods cannot be solved by software alone. Therefore, an obvious solution is to use specialized hardware that can compute the appropriate algorithms – or large portions of them – in parallel to the main processor. Then AUTOSAR CSM and the associated cryptographic libraries pass requests only to this hardware and, in their MainFunction, cyclically check if a result is available. The first of these hardware coprocessors were specified as “Secure Hardware Extension” by the Manufacturers’ Software Initiative (HIS) already in the last decade. Regarding cryptographic algorithms, this specification is still limited to the implementation of AES-128 in different modes. More recent developments show that, due to the large number of possible use cases, a pure hardware coprocessor is often limited and therefore not ideal.

The result is a trend towards so-called hardware security modules (HSM). HSMs are stand-alone microcontrollers that are connected to the host system’s buses by a kind of firewall. The HSM normally has its own protected RAM, an exclusive flash area for program code and data and its own peripherals such as timers, hardware accelerators for some cryptographic algorithms or generators for true random numbers. It is capable of accessing the host’s complete hardware. This allows the implementation of secure, authenticated startup of the system or host monitoring at run-time. The exclusive data flash memory can be used to store secrets such that they cannot be accessed by the host system. This means that the host can request a cryptographic operation that is performed by the HSM without the key leaving the HMS. In this respect, however, the special advantage of the HSM is that it is freely programmable. As a stand-alone microcontroller, the HSM is capable of running any program code optimized for the current use case. This enables it to implement far more security requirements than a simple coprocessor.

Implementation of HSM firmware
It may seem tempting to simply implement well-established AUTOSAR standard software on the HSM and use standard AUTOSAR methods to connect it to the environment. This would allow reuse of familiar AUTOSAR implementation patterns. However, it really isn’t: The use cases of a typical AUTOSAR system with real-time signal processing and an HSM that focuses on security differ considerably. This clearly shows that HSM firmware can achieve significantly more efficiency if it is optimized for its purpose more freely. Moreover, only limited resources are available to current HSM hardware – another factor that makes using AUTOSAR software on the HSM difficult.

HSM use cases are typically classic client-server models: The host sends one or more requests to the HSM where they are processed and is notified as soon as a result is available. Unlike a classic AUTOSAR system, the number of management and background tasks on an HSM is very limited. One can therefore assume that a large portion of the HSM computation time will be dedicated to processing host requests.

When an operating system that allows interrupts is used on the HSM, task mapping and prioritization can be optimized accordingly. In this way, long-running operations can be processed in low-priority tasks while shorter operations may interrupt them. A high-priority, cyclic task allows processing management tasks, if these are necessary. With this type of mapping, the cryptographic routines can be implemented such that they perform their specific task in one session. Interrupts need no longer be allowed by the appropriate routine itself. However, they are made – transparent to the routine – by the operating system. The associated management overhead is no longer necessary, which significantly reduces both code size and run-time. Furthermore, interrupts occur only when actually necessary – if only a single operation has to be processed, it will be computed without interruption. Figure 2 shows the architecture of the resulting HSM firmware.

Figure 2  Architecture of optimized HSM firmware

Potential for optimization
The fact that HSM hardware performance is limited compared to the host system is an obstacle to the optimization possibilities of HSM software. For this reason, pure software implementations on the HSM should theoretically take longer than on the host. In practice, however, the savings due to optimization prevail. Figure 3 shows measurements for RSA on a host system compared to the associated, significantly slower HSM. Here, the AUTOSAR software algorithm corresponds to the software implementation according to AUTOSAR in the way it is used on the host. To measure the pure computation time, the appropriate MainFunction that does not terminate until the computation is complete was called in a loop. As a result, the total CPU time is available for RSA computation. The effects that result from cyclically processing the MainFunction and the appropriate allocation of CPU time in a real system were considered in a scenario where the MainFunction can compute one tenth of the CPU time. Both the computation time of the unmodified AUTOSAR implementation and that of an optimized implementation were measured on the HSM. The latter has no AUTOSAR-specific management overhead.

Figure 3  RSA signature verification run-times on host and HSM

In the example of 3,072-bit RSA signature verification (assumed exponent: 17), the host system requires 525 ms of pure computation time. Assuming that one tenth of the CPU time is actually available for cryptography, this results in more than 5 s until a result is available. On the HSM, the same algorithm has a computation time of 1,276 ms. On the one hand, this shows that reduced hardware performance significantly extends computation times. On the other hand, as the HSM can use almost all the CPU time to compute the appropriate function, only a quarter of the time – compared to the host scenario – passes until the result is available. What is interesting now is the optimized algorithm on the HSM: It requires only 236 ms. Using this optimization, the HSM firmware thus delivers the result one order of magnitude faster than the host – in a time frame that corresponds to that of real-time signal processing.

Time for cryptography
Using an HSM with optimized firmware therefore unlocks the possibility of future-proof cryptography also for classic real-time ECUs. Even asymmetric encryption methods are no longer limited to special operating modes, but can be computed and used at run-time with acceptable performance. This opens up cryptographic use cases that were previously impossible to implement. The HSM’s free programmability enables application-specific optimizations in terms of selecting the required methods and their implementation.

Related articles:


 —Tobias Jordan has been working on automotive basic software for the last 11 years. He’s an expert for operating systems and security at Elektrobit Automotive. Thomas Hohnstein has been working on embedded software for 10 years and is now the head of technology center security at Elektrobit Automotive.




Leave a comment