Look at the high-end of the microcontroller (MCU) segment today and you will find ICs with micro-architectural and instruction-set characteristics that resemble those found in the microprocessor segment. Indeed, companies such as Texas Instruments (TI) and Freescale offer both MCUs and embedded-targeted microprocessors with a shared legacy. Still, there are significant differences between the two segments that range from the companies' chip design and manufacturing strategies to integrated feature sets and the software support ingrained in the hardware. Embedded design teams must carefully consider their application to make the right choice of taking the MCU or microprocessor path even before considering device selection.
Looking at MCU and microprocessor technology from a high level, it appears that the segments butt up against one another. You have compatible or overlapping instruction sets in both camps, and MCU clock frequencies have escalated to a point that approaches the bottom end of the microprocessor segment.
Let us consider some examples of what appears to be a smooth progression of compatible processor technology moving up an MCU product line and into the microprocessor area. Freescale offers the Kinetis MCU family based on the ARM Cortex-M4 core, including products such as the K10, K20, K30, K40, and K60 MCUs. On the microprocessor side, Freescale offers the i.MX microprocessor family. The i.MX1/2, i.MXL, and i.MXS families are based on an ARM9 core. The i.MX3x family is based on an ARM11 core, and the i.MX5x family is based on an ARM Cortex-A8 core.
Clearly there are differences in the CPU cores used across the Freescale ARM-based portfolio. We will discuss architectural differences in cores throughout the article, but in a simplified comparison, they all support the same basic instruction set. The Cortex-M4 core is optimized for low power and a smaller die size. The ARM9, ARM11, and ARM Cortex-A8 progression basically represents a scaling of performance capabilities. We could make largely the same type of comparison with Freescale's PowerPC-based portfolio including the MPC6x and MPC7x microprocessor families and the PowerPC MCU family.
In the case of TI (Figure 1), there is a parallel story both in the case of ARM-based families and in DSP-centric processors. The Stellaris MCU family is based on the ARM-Cortex M3 core and includes 3000, 5000, 6000, 8000, and 9000 Series MCUs. The Sitara microprocessor family includes ARM9-based AM1x processors and Cortex-A8-based ARM3x processors.
Figure 1: The TI Concerto MCU family combines an ARM Cortex-M3 core and a floating-point-enabled, DSP-centric core based on the company's C2000-family architecture.
MCU vs. microprocessor performance
The MCU and microprocessor landscape indeed looks continuous from a high level. You can almost consider the border between the two similar to, say, comparing 16- and 32-bit MCUs where you have overlapping choices for many applications. But doing so would be a mistake. Let's dig deeper to discover why.
By some measures MCUs and microprocessors are close from a performance perspective. Consider the CoreMark benchmarks published by EEMBC (Embedded Microprocessor Benchmark Consortium). Specifically, let's discuss the CoreMark/Mhz scores that eliminate clock speed from the comparison. Published scores for Freescale Kinetis MCUs range from 2.05 to 2.95. Published scores for the i.MX5 family range from 2.28 to 2.45.
The Kinetis and i.MX5 performance is close with Kinetis even having the upper hand on a per MHz basis. Of course we know that the i.MX5 is faster and part of the reason is clock speed. Across the i.MX portfolio, maximum clock speed scales from 400 MHz to 1.2 GHz. The fastest available Kinetis processors run at 150 MHz, although Freescale has announced plans for 200 MHz devices. Let's reconsider the CoreMark score with clock speed in the picture. An i.MX5 running at 800 MHz scored 1964 while the highest Kinetis score is 427 for a 150 MHz MCU.
The clock speed discrepancy
In actuality, there will remain a significant gap between MCU and microprocessor performance for a number of reasons, starting with clock speed. Maximum clock speed depends both on the manufacturing process and the CPU microarchitecture.
We all know that clock speeds go up as manufacturers move along Moore's law to finer process geometries. MCUs will always be on older processes. Sangmin Chon, TI's C2000 marketing manager, said, "In MCUs, we will always be a couple of nodes behind." Chon noted that MCUs are typically deployed in much more environmentally challenging applications, such as higher-temperature applications, than are microprocessors. That fact leads the MCU makers to rely on robust more-conservative process nodes.
Now let's consider some microarchitecture elements that will gate MCU clock speed. Deep pipelines are an important feature that has enabled faster microprocessor clock speeds. Some microprocessors have used 20 stages or more. The recent trend has been more in the order of 10 stages, and MCUs typically rely on 3 to 5 stages.
We will now look at an example where the difference is minimal. Freescale's fastest PowerPC MCUs are based on the e300 core that has 4 stages. The MPC6x and MPC7x microprocessors are based on the e600 core that has 7 stages. Theoretically the e300 core is capable of 667 MHz operation, whereas the e600 is capable of 1.8 Ghz operation. The pipeline is not the only gating factor, but it is an important one.
Certainly MCU designers could increase pipeline depth, but doing so runs counter to many accepted MCU characteristics. For example, consider interrupt response. An interrupt requires that a CPU reload the pipeline after a context switch so there are a number of cycles in which no instructions are completed. MCUs are already at a disadvantage in clock speed relative to microprocessors, so that pipeline reload process can create unacceptable latency with long pipelines.
Long pipelines also impact the silicon footprint, and the die size directly impacts cost. The microprocessor market is more willing to accept higher prices for better performance, whereas cost remains a huge concern in most MCU-based systems.
Let's move on to a comparison of the memory situation in MCUs relative to microprocessors and the impact of memory implementations in applications. Integrated memory has been a defining tenet of the MCU space, going back to the earliest products such as the 8051. Originally designed by Intel, it continues to remain popular today and is sold by many manufacturers.
TI's Chon said, “When you think of the embedded MCU space, you think of devices with embedded memory." Chon added that with most applications, the memory requirements will guide the design team in choosing an MCU or MPU. Today, most MCU-based designs rely on integrated flash memory to store code. TI's top-end ARM MCU, the Stellaris 9000 Series, integrates 512 Kbytes of flash (Figure 2). There are MCUs on the market today that integrate several Mbytes of flash, but clearly, the size of an application that an MCU can handle is constrained.
Figure 2: The 9000 series MCUs in TIS ARM Cortex-M3-based Stellaris family integrate 512 Kbytes of flash for code storage.
Then again, there are exceptions to the on-chip flash story. NXP Semiconductors, for example, is using an interface called Quad SPI (sometimes called SPIFI for SPI Flash Interface) with a number of its MCUs. The Quad SPI link uses 4 wires and can quadruple the access speeds to flash memory. NXP stated that the interface can yield 40 Mbps read rates.
The first products shipping in NXP's ARM Cortex-M4-based LPC4300 MCU family do not integrate any Flash. Indeed the LPC4330 and LPC4350 MCUs rely entirely on Quad-SPI-connected memory. The external flash is treated as part of the MCUs memory map and the MCU can boot from the external memory. Even when NXP ships LPC4300 MCUs with integrated flash, the Quad SPI capability will come in handy allowing design teams to choose the amount of memory required in an application.
NXP product marketing manager Gordon Cooper said that flexibility is especially important in multimedia and graphics applications where integrated flash can handle the code size required for the application. The interface does not provide the type of speeds found in high-speed DRAM interfaces used with microprocessors, but still Quad SPI allows for scalability that is not typically available in MCUs.
Memory management or protection
Another major difference between most MCUs and microprocessors comes in memory management or protection. All of the microprocessors discussed here integrate memory management units (MMUs) as a feature of the microarchitecture. In contrast, most MCUs integrate what is often called a memory protection unit.
Both MMUs and memory protection units can serve to partition the memory space and protect mission-critical code and data from tampering. For example, a memory partition can be dedicated to private use by one CPU task. Operating systems can also reserve access to memory areas. The MMUs go a step further and virtualize the memory space allowing an operating system seamless access to a fragmented memory map.
The presence of, or lack of, an MMU is another key guiding point for most design teams contemplating processors. You need an MMU if your product will run a general-purpose operating system such as Linux. Most real-time operating systems for embedded applications work equally well with MMUs or memory protection units.
While rare, you can find full MMU capabilities in select MCUs. For example, the Freescale PowerPC MCU family includes an MMU. Atmel (Figure 3) also integrates MMUs in its ARM9-based SAM9 product family.
Figure 3: Atmel's SAM9XE MCU family is based on an ARM9 core that includes an MMU. Most MCUs only integrate a memory protection unit.
In the Atmel case, there is a fine line with the nomenclature that applies to the products. Atmel calls the SAM9R and SAM9M families embedded microprocessors. It calls the sibling SAM9XE products MCUs. The embedded microprocessors are available at higher clock speeds, as fast as 400 MHz, and include DRAM interfaces. All of the ARM9-based products include an MMU, however, and all integrate peripheral functionality in an MCU fashion.
Having mentioned peripheral integration, let's go a bit deeper in that area. Both microprocessors and MCUs integrate a rich set of peripheral functions these days, but there are generally differences in the mix. On a microprocessor, you typically find mainly digital peripheral functionality. Typical examples include Ethernet and USB, graphics controllers, and audio controllers.
In the MCU space you will also find connectivity peripherals and graphics controllers, albeit usually targeted at smaller, lower-resolution displays. In MCUs, however, you typically find a host of analog peripherals such as data converters and pulse-width modulation (PWM) peripherals. Again there are always exceptions. The previously mentioned, Atmel embedded microprocessors include A/D converters and PWMs. A number of Freescale's i.MX microprocessors include A/D converters, timers, analog audio capability, and PWMs, as depicted in Figure 4.
Figure 4: Some members of Freescale's i.MX microprocessor family include analog-centric peripherals such as those commonly found in MCUs, including A/D Converters and PWMs as illustrated by the i.MX251.
Customer driven decisions
Despite the technical divide discussed here between the microprocessor and MCU space, the design team will face instances where they must choose between a high-end MCU and a low-end microprocessor. Driven by such customers, the IC makers are reacting to the overlapping space. TI's Chon noted that, "From the microprocessor side, we do see the line blurring."
One thing that TI has done is introduce the StarterWare software-development package for its ARM9 and Cortex-A8-based processors. StarterWare is free and eliminates the need for a complex operating system in many cases. Still, the tool enables quick development of systems with features such as USB and network stacks. Indeed the software is more like the StellarisWare tool that the company ships with the MCUs than the operating-system-based tools it has historically provided for microprocessor customers.
It is also customer demand that has driven the evolution of TI's ARM-plus-DSP products in both the MCU and microcontroller segments. From the outside looking in, you might presume that TI wanted to offer similar architectures across both segments that afforded scalable performance and feature set.
In actuality, customers drove the ARM-plus-DSP products in the segments for completely unrelated reasons. In the case of the TMS320C6A816x and OMAP families, customers needed the DSP to handle multimedia- and communication-centric routines that were math intensive, while also needing a processor that could host a complex operating system and user applications. In the case of the Concerto MCUs, customers were looking for the convenience of an ARM core upon which it was simple to implement functions such as USB and network stacks. In the MCU case, the DSP is targeted primarily at industrial-type applications such as real-time control of motors and drives.
Look at application requirements
Summing up our discussion here, the message is not a simple one. The MCU and microprocessor segments are moving toward overlap, and there are instances where either might suffice in your design. You should start the decision process with a high-level look at application requirements and choices such as whether a robust operating system is required leading to a microprocessor or whether system footprint limits mandate that code fit in integrated memory on an MCU. The IC makers suggest that in many cases, design teams know going in that they will follow either the MCU or microcontroller design path. If that choice is blurred, the team must closely evaluate the growing trend of peripheral integration in microprocessors or the escalating clock rates and external memory support emerging in MCUs. It is these exceptions to current conventions in feature set that will likely lead to the right choice.