Give away medical masks when you place an order. learn more

AutoBench™1.1: How Will Your Processor Perform in Automotive Applications?



EEMBC, the Embedded Microprocessor Benchmark Consortium, was formed in 1997 to develop meaningful performance benchmarks for the hardware and software used in embedded systems. Through the combined efforts of its members, EEMBC® benchmarks have become an industry standard for evaluating the capabilities of embedded processors, compilers, and Java implementations according to objective, clearly defined, application-based criteria.

Since releasing its first certified benchmark scores in April 2000, EEMBC scores have effectively replaced the obsolete Dhrystone MIPS, especially in situations where real engineering value is important. EEMBC benchmarks reflect real-world applications and the demands that embedded systems encounter in these environments. The result is a collection of "algorithms" and "applications" organized into benchmark suites targeting telecommunications, networking, digital media, Java, automotive/industrial, consumer, and office equipment products. An additional suite of algorithms specifically targets the capabilities of 8- and 16-bit microcontrollers.

AutoBench™ 1.1 is a suite of benchmarks that allow users to predict the performance of microprocessors and microcontrollers in automotive, industrial, and general-purpose applications. Its 16 benchmark kernels include the following:
  • Generic Workload Tests
    These tests include bit manipulation, matrix mapping, a specific floating-point tester, a cache buster, pointer chasing, pulse-width modulation, multiplication, and shift operations (typical of encryption algorithms).
  • Basic Automotive Algorithms
    These tests include controller area network (CAN), tooth-to-spark (locating the engine’s cog when the spark is ignited), angle-to-time conversion, road speed calculation, and table lookup and interpolation.
  • Signal Processing Algorithms
    These tests include algorithms which are becoming increasingly important for sensors used in engine knock detection, vehicle stability control, and occupant safety systems. They include fast Fourier transforms (FFT and iFFT), a finite impulse response filter (FIR), an inverse discrete cosine transform (iDCT), and an infinite impulse response (IIR) filter.
In this article we will detail how the various parts of the AutoBench benchmarks work and what they can tell you about how the processor you are considering for your next design will perform in automotive applications.

Angle to Time Conversion benchmark

The EEMBC Angle to Time Conversion benchmark simulates an embedded automotive application where the CPU reads a counter which measures the real-time delay between pulses sensed from a toothed wheel (gear) on the crankshaft of an engine. In this application, the CPU determines the top dead center (TDC) position on the crankshaft, computes the engine speed, and provides a conversion from the tooth wheel pulses to the precise crankshaft angle position. This value is expressed in linear time from TDC. The tooth wheel pulses actually represent crankshaft angle, and the delay between pulses yields angular velocity of the crankshaft (engine speed).

The kernel starts each pass of the loop by reading a previous real-time counter value from the test data file. The previous counter value is subtracted from the current counter value to determine the time between teeth edges. As long as the CPU does not detect TDC, the tooth pulse counter is incremented, and indicates progress through a crankshaft revolution. As the tooth pulse counter increments, each cylinder is ‘fired’ in turn once its ‘firing angle’ (tooth number) is reached. At each cylinder firing a precise ‘firing time’ is issued to some external hardware counter. Detection of the next TDC causes the tonewheel tooth counter to be reset to zero, and the entire process begins again.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 1: Optimization rules for the Angle to Time Conversion benchmark.

Figure 1: Algorithm flowchart for the Angle to Time Conversion benchmark.

Basic Integer and Floating Point benchmark

The EEMBC Basic Integer and Floating Point benchmark algorithm measures basic integer and floating point capabilities.

The benchmark calculates the arctan(x) function using the telescoping series:

arctan(x) = x * P(x^2) / Q(x^2)

where P and Q are polynomials, and x is assumed to be in the range from zero to tan(pi/4). The benchmark limits the input domain to ensure this condition is met and adjusts any output values which correspond to limited input values so that the correct result is always obtained.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 2: Optimization rules for the Basic Integer and Floating Point benchmark.

Figure 2: Algorithm flowchart for the Basic Integer and Floating Point benchmark.

Bit Manipulation benchmark

The EEMBC Bit Manipulation benchmark simulates an embedded automotive/industrial application where large numbers of bits have to be manipulated and many decisions have to be taken based upon bit values and bit arithmetic takes place.

The kernel simulates part of a character display system where characters are shifted into a line buffer. The line buffer is then converted into a series of pixels by mapping characters through a display character ROM. The pixels are moved into a display buffer until the entire buffer is displayed.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 3: Optimization rules for the Bit Manipulation benchmark.

Figure 3: Algorithm flowchart for the Bit Manipulation benchmark.

Cache “Buster” benchmark

The EEMBC Cache “buster” benchmark simulates an embedded automotive/industrial application without a cache. It highlights performance in those situations when long sections of control code are executed with very little backwards branching or revisiting of the same data. Processors which utilize look ahead mechanisms rather than caches should perform well here.

The kernel uses an intricate algorithm involving data and function pointers to ensure that data and code locality does not occur during execution.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 4: Optimization rules for the Cache “Buster” benchmark.



Figure 4: Algorithm flowchart for the Cache “Buster” benchmark.

CAN Remote Data Request benchmark

The EEMBC CAN Remote Data Request benchmark simulates an embedded automotive application where a controller area network (CAN) interface node exists for exchanging messages across the system.

The situation being simulated is that which occurs when a remote data request (RDR) message is received by all nodes. Every node must check the identifier of the message to see if they own that type of data. If yes, then the responsible node must gather the data and transmit it back onto the network for the originator of the RDR.

The kernel fetches received messages from a simulated receiver buffer, checks the identification (ID) field and ignores those messages which it is not interested in. Interesting messages are then usually stored, unless they are an RDR message. In this case, the data associated with the ID is sought and then placed into a simulated transmit buffer for sending back to the originator.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 5: Optimization rules for the CAN Remote Data Request benchmark.



Figure 5: Algorithm flowchart for the CAN Remote Data Request benchmark.

Fast Fourier Transform (FFT) benchmark

The EEMBC Fast Fourier Transform (FFT) benchmark simulates an embedded automotive/industrial application performing a power spectrum analysis of a time varying input waveform.

The kernel computes the ‘radix-2’ decimation in frequency FFT on complex input values stored in real and imaginary arrays. After the time domain values are converted to the equivalent frequency domain, the power spectrum is calculated.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 6: Optimization rules for the Fast Fourier Transform benchmark.



Figure 6: Algorithm flowchart for the Fast Fourier Transform benchmark.

Finite Impulse Response (FIR) Filter benchmark The EEMBC Finite Impulse Response (FIR) Filter benchmark algorithm simulates an embedded automotive/industrial application where the CPU performs a FIR filtering sample on 16-bit or 32-bit fixed-point values. High- and low-pass FIR filters simply process the input signal data.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 7: Optimization rules for the Finite Impulse Response Filter benchmark.



Figure 7: Algorithm flowchart for the Finite Impulse Response Filter benchmark.

Inverse Discrete Cosine Transform (iDCT) benchmark

This EEMBC inverse Discrete Cosine Transform (iDCT) benchmark simulates an embedded automotive/industrial application performing digital video and graphics applications such as image recognition.

The kernel performs iDCT on an input data matrix set using 64-bit integer arithmetic.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 8: Optimization rules for the inverse Discrete Cosine Transform benchmark.



Figure 8: Algorithm flowchart for the inverse Discrete Cosine Transform benchmark.

Inverse Fast Fourier Transform (iFFT) benchmark

The EEMBC inverse Fast Fourier Transform (iFFT) benchmark simulates an embedded automotive/industrial application analysis of a time domain analysis of an input frequency spectrum. This might be used in noise cancellation applications.

The kernel computes the ‘radix-2’ decimation in frequency iFFT on complex input values stored in real and imaginary arrays.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 9: Optimization rules for the inverse Fast Fourier Transform benchmark.



Figure 9: Algorithm flowchart for the Inverse Fast Fourier Transform benchmark.

Infinite Impulse Response (IIR) filter benchmark

The EEMBC Infinite Impulse Response (IIR) filter benchmark algorithm simulates an embedded automotive/industrial application where the CPU performs an IIR filtering sample on 16-bit or 32-bit fixed-point values. It implements a Direct-Form II N-cascaded, second-order IIR filter. IIR filters can often be more efficient than FIR filters, in terms of attaining better magnitude response with a given filter order. This is because IIR filters incorporate feedback and are capable of realizing both poles and zeros of a system, whereas FIR filters are not capable of realizing the zeros. The difference equation for a Direct Form II N-Cascaded Direct second-order IIR filter is:
{u(n) = x(n) + a(1)*x(n-1) + a(2)*x(n-2),
{y(n) = b(0)*u(n) + b(1)*u(n-1) + b(2)*u(n-2);


where:
x(n) = input signal of the biquad at time n
u(n) = state variable of the biquad at time n
y(n) = output signal of the biquad at time n a(n),
b(n) = coefficients of the biquad

High- and low-pass IIR filters process the input signal data. Binary comparators also digitize the outputs of the filters. This IIR filter benchmark explores a CPU’s ability to perform multiply-accumulates and rounding. It employs typical DSP functions that would replace an analog signal chain comprised of op-amps and comparators.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 10: Optimization rules for the Infinite Impulse Response filter benchmark.



Figure 10: Algorithm flowchart for the Infinite Impulse Response filter benchmark.

Matrix Arithmetic benchmark

The EEMBC Matrix Arithmetic benchmark simulates an embedded automotive/industrial application which performs a lot of matrix arithmetic.

The kernel performs an LU decomposition on ‘n x n’ input matrices. It also computes the determinant of the input matrix then a cross product with a second matrix.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 11: Optimization rules for the Matrix Arithmetic benchmark.



Figure 11: Algorithm flowchart for the Matrix Arithmetic benchmark.

Pointer Chasing benchmark

The EEMBC Pointer Chasing benchmark simulates an embedded automotive/industrial application which performs a lot of pointer manipulation.

The kernel employs a doubly linked list then searches the list for entries which match an input token. A large set of input tokens is used to exercise the entire list. The number of steps taken to find each input token is recorded.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 12: Optimization rules for the Pointer Chasing benchmark.



Figure 12: Algorithm flowchart for the Pointer Chasing benchmark.

Pulse Width Modulation (PWM) benchmark

The EEMBC Pulse Width Modulation benchmark simulates an application in which an actuator is driven by a PWM signal proportional to some input. Specifically, the algorithm presumes that the embedded processor is driving an H-bridge motor driver with both direction and enable signals. Outputs are provided for two such H-bridge drivers, as might be used for a bipolar stepper motor driver, or proportional DC motor driver.

The stepper motor is controlling the position of the actuator. We can control it by passing a desired position command to the algorithm, and let the algorithm control moving the motor to that position.

On each pass, the algorithm simulates the PWM signals and checks to see if the motor has reached the commanded position once per PWM cycle. By providing the stepper motor with phasing signals as well as PWM control of each phase, the motor can be micro-stepped to provide finer resolution and smoother motion. The phase control provides direction signals for energizing each of the stepper motor coils in a typical bipolar full-step sequence. The algorithm could be used in applications with actuators other than stepper motors, making use of just the PWM feature without the phasing control, in which case the PWM signals would provide proportional velocity control, while the phase signals would provide motor direction.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 13: Optimization rules for the Pulse Width Modulation benchmark.



Figure 13: Algorithm flowchart for the Pulse Width Modulation benchmark.

Road Speed Calculation benchmark

The EEMBC Road Speed Calculation benchmark simulates an automotive application where the CPU repeatedly calculates the road speed based on differences between timer counter values. All values are filtered to minimize errors due to noise. The calculation involves straight-forward arithmetic, but must also deal with the situation when the timer counter rolls over; or when the measurement results show abrupt changes. At zero road speed, the application has to ensure that it does not infinitely wait for a counter increment.

The benchmark has a mix of arithmetic and flow control routines. The arithmetic portion involves add, subtract, multiply and divide. For low end microcontrollers, the arithmetic capability may become a performance bottleneck. For higher end processors, the pipeline efficiency may be more important than raw performance since there are a significant number of compare and branch instructions (i.e., flow control). A processor that is good in both aspects will shine with this kind of benchmark.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 14: Optimization rules for the Road Speed Calculation benchmark.



Figure 14: Algorithm flowchart for the Road Speed Calculation benchmark.

Table Lookup and Interpolation benchmark

The EEMBC Table Lookup and Interpolation benchmark algorithm is used in engine controllers, anti-lock brake systems, and other applications to access constant data quicker than by raw calculation. Instead of storing all data points, which would consume a lot of memory, selective data points are stored and the software then interpolates between them. Data may be stored in two dimensional (X,Y) or three dimensional (X,Y,Z) tables.

For example, software periodically performs a table lookup process to derive an output value ignition angle from two input variables, engine load and engine speed. The engine control continuously derives the input variables, load and speed, from external engine sensors. Speed is derived by measuring the period between pulses from magnetic pickup sensing gear teeth on the crankshaft. Load is derived from sensors measuring air flow through the throttle body.

The bilinear interpolation technique determines values by using four points in a grid that surrounds the desired point.

This algorithm simulates engine load and speed which are indices into an “angle” table. The engine load (X) and engine speed (Y) values are calculated and normalized. The ignition angle (Z) value is then interpolated from the table.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 15: Optimization rules for the Lookup and Interpolation benchmark.



Figure 15: Algorithm flowchart for the Lookup and Interpolation benchmark.

Tooth-to-Spark benchmark

This EEMBC Tooth-to-Spark benchmark simulates an automotive application where the CPU controls fuel injection and ignition in the engine combustion process. Tooth-to-Spark, part of an engine control unit (ECU), performs real-time processing of air/fuel mixture and ignition timing. Based on the operating conditions presented to the ECU, the CPU adjusts the output values for fuel injector duration and ignition timing from ‘nominal’ values on each pass.

The ECU determines whether the engine is running or not, and enables the fuel pump and igniters accordingly. While the engine is being started, the ECU performs special fuel injection duration and spark timing to optimize starting conditions.

Once the engine is running, the CPU processes the output variables for injector and igniter timing on each pass. The CPU primarily makes adjustments according to the engine speed/load parameters, but also makes lesser adjustments for other variables.

The entire process is repeated on each pass, taking input values from the test data and computing new output values. The input test data can reside in ROM or RAM, so comparisons can be made for performance from either memory source.

Category Allowed Disallowed
ANSI C X  
Intrinsics/Language extensions X  
Custom libraries X  
Assembly language X  
HW accelerators X  

Table 16: Optimization rules for the Tooth-to-Spark benchmark.



Figure 16: Algorithm flowchart for the Tooth-to-Spark benchmark.

Summary

This article provided details on how the various parts of the AutoBench benchmarks work and what they can tell you about how the processor you are considering for your next design will perform in automotive applications. As can be seen, there are many parameters that must be weighed depending on the specific application requirements of the system. Careful consideration of the key application needs will help in determining that best part for the design.

Supplier