Innovations driving the future of cryptography in hardware processors

Article By : Wajdi Feghali, Intel

The hardware industry has been working to produce new guidelines, microarchitectural enhancements and innovative software optimization methods.

It’s quite probable that in the future everything will be encrypted, from your grocery list to your medical records. This is an exciting notion, but the field of cryptography is particularly unsettled, and there’s a lot of work being done now to ensure that data can be secured well into the future.

Multiple cryptographic operations could apply to every byte of data, because data is cryptographically protected across multiple layers of the software, network and storage stacks. These processes support highly critical business functions that require strong security, but at the hardware level they are among the most compute-intensive operations in existence. And the demand for cryptographic computation continues to grow, with the amount of data generated each year rising exponentially and as organizations employ larger key sizes, as well as multiple simultaneous cryptographic algorithms, to bolster security. All the while those computing requirements continue to inflate.

To combat the cryptographic compute cost problem, the hardware industry has been working to produce new guidelines, microarchitectural enhancements and innovative software optimization methods. Strong examples of this progress over the years include the introduction of next-generation fixed function processor instructions that reduced the compute requirements of Advanced Encryption Standard (AES) symmetric encryption and more recently FIPS algorithms. As a result, organizations have become increasingly committed to implementing strong cryptographic ciphers to better secure data and communications over the last 10 years.

But as quantum computing advancements continue to accelerate, the security efficacy of both symmetric and asymmetric encryption algorithms may be in jeopardy. Increasing key sizes (from 128 to 256 bits) can help make symmetric algorithms (such as AES) more resilient to quantum attacks, but again, this solution carries with it higher compute costs. Asymmetric crypto algorithms (such as RSA and ECDSA) will very likely fall short as well. Many have said the raw power of quantum computers will be the death of encryption, but we don’t believe that will be the case.

The above-mentioned incumbent encryption schemes will likely be supplanted with new post-quantum cryptographic approaches. The industry is actively working to transition to new cryptography standards fit to address these impending post-quantum security challenges. In fact, many proposals have already been submitted to the NIST Post-Quantum Cryptography (PQC) competition, out of which there are varying requirements in terms of key size, storage and computation specifications.

As the dawn of the quantum computing era approaches, the industry will need to rally together to move toward new methods and standards.

What will that shift look like? The transition will be lengthy, and existing cryptography will remain in place until the industry is able to fully adopt emergent quantum-resistant algorithms. We expect this to cause a high computational burden, and that organizations will not broadly adopt stronger encryption until the underlying post-quantum algorithms become economically sustainable from a compute performance perspective.

To accelerate the inception of tomorrow’s cryptography, the industry will need to develop inventive hardware improvements and optimized software solutions that work together to shrink compute requirements. The good news is we’re not starting from scratch by any means.

Here are six key examples of cryptographic performance improvements and innovation taking place today:

1. Transport Layer Security (TLS) Cryptographic Algorithms — TLS protocols operate in two phases. First is the session initiation stage. When a session is initiated, the client must communicate private messages to the server using a public key encryption method (often RSA) before the protocol will generate a shared secret key. RSA is based on modular exponentiation, a high-cost compute mechanism that produces most of the TLS session initiation processor cycles. Combining RSA with an algorithm such as Elliptic Curve Cryptography (ECC), using techniques such as perfect forward secrecy, can offer even greater security.

In the second phase, the bulk data is transferred. The protocols encrypt the data packets to ensure confidentiality and leverage message authentication code (MAC) based on a cryptographic hash of the data to guard against anything attempting to modify the data in transit. Encryption and authentication algorithms protect TLS bulk data transfers, and in many cases stitching the two together can increase overall performance. Some cipher suites such as AES-GCM even define combined “encryption + authentication” modes.

2. Public-Key Cryptography — To support improved performance for “big number” multiplication processes often found in public-key ciphers, some vendors are creating new instruction sets. For example, Intel’s Ice Lake based processors introduced AVX512 Integer Fused Multiply Add (AVX512_IFMA) Instruction Set Architecture (ISA) support. The instructions multiply eight – 52-bit unsigned integers found in the wide 512-bit (ZMM) registers, produce the high and low halves of the result and add it to the 64-bit accumulator. Combined with software optimization techniques (like multi-buffer processing), these instructions can provide significant performance improvements not only for RSA, but for ECC as well.

3. Symmetric Encryption — Two instruction enhancements increase performance for AES symmetric encryption: vectorized AES (VAES) and vectorized carryless multiply. The VAES instructions have been extended to support vector processing of up to four AES blocks (128-bits) at a time using the wide 512-bit (ZMM) registers, and when properly utilized, will provide a performance benefit to all AES modes of operation. Some vendors have also extended support to vector processing of up to four carryless multiplication operations at a time using the wide 512-bit (ZMM) registers to provide additional performance to Galois hashing and the widely used AES-GCM cipher.

4. Hashing — It’s possible to boost compute performance by creating new extensions for the Secure Hash Algorithm (SHA), which digests arbitrary size data into a fixed size of 256-bits . These extensions include instructions that provide a significant improvement in SHA-256 performance enabling more cryptographic hashing to be employed.

5. Function Stitching — Function stitching was pioneered back in 2010 and is a technique to optimize two algorithms that typically run in combination, yet sequentially, such as AES-CBC and SHA256, and form them into a single optimized algorithm focused on maximizing processor resources and throughput. The result is a fine-grained interleaving of the instructions from each algorithm so that both algorithms execute simultaneously. This enables processor execution units that would otherwise be idle when executing a single algorithm, due to either data dependencies or instruction latencies, to execute instructions from the other algorithm, and vice versa. This is very relevant since algorithms still have strict dependencies that modern microprocessor cannot fully parallelize.

6. Multi-Buffer — Multi-buffer is an innovative and efficient technique for processing multiple independent data buffers in parallel for cryptographic algorithms. Vendors have previously implemented this technique for algorithms such as hashing and symmetric encryption. Processing multiple buffers simultaneously can result in significant performance improvements — both for the case where the code can take advantage of single instruction multiple data (AVX/AVX2/AVX512) instructions and even where it cannot. This is important as more data requires cryptographic processing, and as the availability of wider processor data paths will enable the industry to keep pace.

True quantum computing will arrive before we know it, and the industry mindset has already begun to shift from “should this data be encrypted?” to “why is this data not encrypted?” As a community, we must focus on implementing advanced cryptography at the hardware level, along with accompanying algorithmic and software innovations to meet the challenges presented by a post-quantum world. Doing so will lead to more breakthroughs in performance and security across a host of important encryption algorithms and help accelerate the transition to next-generation cryptography schemes the industry will need to navigate the coming decade.

This article was originally published on EE Times.

Wajdi Feghali is an Intel Fellow.

Leave a comment