Give away medical masks when you place an order. learn more

Which ARM Cortex CPU is Right for Your Next MCU-Based Application?



ARM-based CPUs are ubiquitous in the MCU world and often there are several available from the same MCU supplier. Each ARM CPU has been optimized for a class of specific processing requirements, from low-end power-constrained applications to high-power performance-optimized, dual-core applications. The most popular ARM CPU today in MCU devices seems to be the Cortex CPU. How do you decide which ARM Cortex CPU is the right one for your application? Let’s explore the primary differences between some of the more popular Cortex-based MCUs by looking at example implementations that will help you decide which one is just right for your next design.

Many options

It’s not unusual to find many different ARM Cortex CPUs within a single MCU family. The Cortex CPU and its optional extensions address a variety of application needs, but all have backward-compatible instruction sets, as illustrated in Figure 1. Starting with the Cortex-M0/M0+/M1 family, the instruction set is targeted for general-purpose data processing and IO tasks. The Cortex-M3 CPU adds advanced-data processing and bit-field manipulation instructions that speed up more complex control and general-purpose computational tasks. The Cortex-M4 CPU adds Digital-Signal processing (DSP) instructions and offers Single-Instruction Multiple-Data (SIMD) operations where the same data-processing instruction can operate on multiple-data sources at the same time. These specialized capabilities can dramatically accelerate complex data-processing tasks, like those found in audio and video applications. The Cortex-M4 CPU can also add a Floating Point Unit (FPU) when performance and precision are both important elements for the target algorithm. Analog sensing and motor control, for example, often use floating point for its precision, but high performance is required for fast-control loop closure.


Figure 1: ARM Cortex MCUs instruction set compatibility. (Courtesy of ARM and STMicroelectronics) 

One key advantage of this regular- and backward-compatible instruction set is that MCU manufacturers can create devices optimized for specific applications, while “covering all bets” by having upward compatibility if the algorithm grows in complexity during the lifetime of the target system. For example, how many times have you needed to add more complex functions in order to satisfy new requirements during development? The upward compatibility of the Cortex Instruction Set Architecture (ISA) makes this easy. In some cases it is possible to simplify the target architecture too, since downward compatibility allows you to reduce cost if lower performance is acceptable.

STMicroelectronics has used several ARM Cortex CPUs within its STM32 MCU family. Figure 2 illustrates the various Cortex CPUs and the key hardware features associated with each MCU series. Notice that the Cortex-M0 CPU is used on the entry-level STM32F030/50/051 devices while the Cortex-M4 with DSP and FPU is used on the high-performance STM32F4xx (such as the STM32F401RCT6) and STM32F3xx devices. The mid-range devices use the Cortex-M3 CPU where the more complex DSP and FPU instructions are not required to have the highest possible performance. (These instructions can be implemented with multiple instructions if needed and most compilers provide a fairly transparent method for switching between hardware implementations and multi-cycle “soft” implementations.)


Figure 2: STM32F MCU family Cortex CPUs and key hardware features. (Courtesy of STMicroelectronics) 

Other vendors also support multiple flavors of ARM Cortex MCUs, often over wide performance and cost ranges. Silicon Labs, for example, has the EFM32 family of MCUs (e.g., the EFM32ZG222F32-QFP48) that use the ARM Cortex-M CPU. The low-end low-power GZ series uses the Cortex-M0+ CPU while the mid-range TG, G, LG, and GG series use the Cortex-M3 CPU. The high-end WG series uses the Cortex-M4 CPU with DSP and FPU enhancements. There are 10 different package options making it possible, with a little up-front planning, to migrate from one CPU type to another, making it easier to adapt to changing requirements or to offer different products using the same base design.

The Cortex-A architecture

The Cortex-M architecture is a very popular one with MCU manufacturers, but the Cortex-A architecture is also showing up in vendors’ devices as well, often in MPUs, where large external memories are used for instructions and data. The Cortex-A CPU is optimized for very-high-performance applications, often with requirements for features like video playback and advanced security. The Atmel Cortex-A5-based SAMA5D4 MPUs, for example (Figure 3), use the Cortex-A5 CPU with a 2 x 32 kb Level 1 cache and a 128 kb Level 2 cache to speed processing performance. An on-chip DDR2/LPDDR/LPDDR2 controller accesses instructions and data from an external memory so very large programs and data sets can be used. Video processing algorithms, for instance, can require very large data sets and also very large programs. Hardware subsystems provide significant capabilities for security, connectivity, control, and user interfaces to simplify the creation of complex human machine interfaces and the associated control systems.


Figure 3: Atmel SAMA5D4 MPU block diagram. (Courtesy of Atmel) 

The SAMA5D4 also supports two important Cortex extension functions – Trust Zone and NEON. Trust Zone supports secure code execution. Typically a processor needs to execute some security-related functions (such as secure PIN entry, or password protection) as well as some normal program functions (such as a graphic display or menu-selection routines). Trust Zone hardware extensions allow the programmer to protect security-related functions from normal accesses and potential security attacks. Even debug capabilities can be limited to just the normal program to further protect the secure routines from snooping and similar attacks.

The NEON extensions provide significant processing performance improvements for SIMD-based algorithms. Common targets for the NEON extension are multimedia, signal processing 2D/3D graphics, video encode/decode, and sound synthesis. NEON has its own independent pipeline and register file and can support signed/unsigned 8-, 16-, 32-, 64-bit, and single-precision floating-point operations on 32 registers that are treated as 64- or 128-bits wide depending on the instruction. NEON can typically provide a significant performance boost over non-SIMD implementations; a 60-150 percent performance boost on complex video codecs is a typical example.

Multi-core CPU solutions

ARM Cortex CPUs are also showing up in multi-core MCU implementations. These devices sometimes have two different performance CPUs, one very-high performance for the “heavy lifting” of the target application functions, and a slower-performance CPU for managing communications ports, user interfaces, and similar low-level control functions. Other multi-core devices have the same type of CPU, just replicated, to make it easy to partition and allocate less-specialized processing functions to achieve the right balance of processing and power efficiency. For example, one CPU could be put in a low-power wait state if it is not required to meet the performance requirement (perhaps during a “slow” data period) and then turned on when additional processing is required.

Texas Instruments, in its Concerto MCU family illustrated in Figure 4 (as an example see the F28M35H52C1RFPT) has added an ARM Cortex-M3 processor to its popular C28x CPU to provide an easy solution for both control and connectivity in a single device. The C28x CPU has been optimized for real-time control and it can leverage its 15+ years of DSP-application experience. The ARM Cortex-M3 CPU is optimized for communications applications and it can leverage the extensive ARM ecosystem for communications drivers (Ethernet, USB, CAN, SPI, etc.) as well as robust scheduling and O/S support.


Figure 4: Dual CPU Core Concerto™ MCU family from Texas Instruments. (Courtesy of Texas Instruments

Matching your application to the right ARM Cortex CPU

You can find ARM Cortex CPUs in a wide variety of MCU families from just about every MCU manufacturer. In order to match the right Cortex CPU with your application you should start by determining which instruction set is the best fit for your application. In particular, look to see if advanced-data-processing capabilities like floating point or DSP are required. Do you need even more advanced features like NEON or Trust Zone? Perhaps your application is more control oriented and low power is a key requirement? If so, the simpler M0 architecture might be the right fit. Mid-range designs can take advantage of the Cortex-M3 CPU and select the device based on connectivity requirements and other key peripherals – you typically have the most choice in mid-range devices from your MCU manufacturer.

If your requirements change during the design phase you may want to be able to migrate to a more feature-rich device or to a more feature-lean device. In this case it may be important to select an MCU family that supports easy migration between devices. You can also use the large ARM Cortex ecosystem to leverage proven drivers, an RTOS, function-specific libraries and development-tool chains. No matter what ARM Cortex CPU you select you can be sure there will be a robust ecosystem available to simplify your design.

For more information on the parts discussed in this article, use the links provided to access product pages on the Hotenda website.

Supplier