Give away medical masks when you place an order. learn more

Using Intelligent Peripherals to Improve CPU Efficiency

Modern MCUs have added a wide range of new features that when used correctly can dramatically improve application efficiency. In particular, the use of intelligent peripherals, peripherals that can operate independently from the CPU, allows the CPU to do other tasks in parallel or to be put into a low-power sleep mode. Using either of these techniques will result in improvements to overall processing efficiency as well as power savings.

The DMA controller

One of the first intelligent peripherals encountered when doing MCU-based designs is the direct memory access (DMA) controller. This specialized hardware block can transfer data between memory and/or peripherals without the need for the CPU to be involved with each transfer. Advanced DMA controllers, such as the one included on the STMicroelectronics STM32F4 family, can further offload the CPU by using features for flexible data stream allocation and transfer management. Let’s look at some of these features in more detail to see how they can be used to improve processing efficiency.

Figure 1 shows a block diagram representing the variety of data paths available in one of the two DMA controllers on STM32F4 devices. As shown on the left side of the figure, DMA requests originate from eight different channels (allocated to the various DMA-enabled peripherals) and are routed to eight different request inputs on the arbiter, establishing a priority (lower-numbered inputs have higher priority). The highest-priority transfer is then activated, and the AHB Masters, on the right side of the figure, execute the desired data transfer. Separate masters for both the memory and the peripheral interfaces further improve efficiency for peripheral-to-memory transfers, probably the most prevalent use of a DMA in MCU-based designs.

The allocation of separate FIFOs for each stream, as shown in the middle of Figure 1, allows the FIFO characteristics to be tuned for each peripheral interface’s characteristics. For example, the threshold level of the FIFO (depth at which a transfer is requested) can be individually set at ¼, ½, or ¾ of the FIFO size. This allows lower-speed channels to wait until the FIFO is almost full before a transfer to minimize overhead. Faster channels would initiate a transfer sooner, perhaps at ½ the size to avoid a FIFO overrun.

Figure 1: STM32F4 family DMA controller (Courtesy of STMicroelectronics).

Other advanced DMA capabilities to look for are related to the management of data transfers. Some peripherals provide an end-of-transfer indicator, which an advanced DMA controller can detect and use to terminate the transfer independently of the CPU. Double buffering and circular buffer management, when done by the DMA controller, eliminate CPU overhead by automatically reconfiguring source and destinations during transfers. If the CPU needed to manage these types of low-level tasks, you can see that processing efficiency would suffer.

This amount of flexibility in mapping, prioritizing, and managing data transfer activity significantly reduces CPU overhead, and once the intelligent DMA controller is initialized, transfers can be managed and bandwidth efficiently allocated without further CPU intervention. This amount of independent operation is a key feature of any intelligent peripheral, one the designer should look for when selecting target devices, and one we will find in the other intelligent peripherals we will discuss next.

Look for intelligence in serial peripherals

Once the use of DMA is understood, it is a natural extension to look for ways to provide additional intelligence to serial peripheral units to take full advantage of DMA capabilities and to further offload the CPU from low-level functions. The use of dedicated FIFO buffers integrated into high-speed peripherals, such as Ethernet and USB, provides an additional level of autonomy from the CPU since transfers can be staged and handled, perhaps via DMA, in a single burst to improve efficiency. Intelligent peripherals can set various levels at which the CPU can be interrupted based on bandwidth requirements. Note that these independent FIFOs can work in conjunction with any FIFOs dedicated to the DMA controller, like those shown in Figure 1 for the STM32F4 devices. The peripheral FIFOs can provide the first level of buffering and the DMA can provide a second level based on which peripherals are active at the same time. This allows for an extra level of management and control (that is, intelligence) not available when FIFOs are only available at the peripherals.

As described previously, many peripherals include flexible interrupts that can be used to request CPU intervention, and if the interrupt is specific enough to tell the CPU exactly what service is required, response time can be significantly reduced. If the interrupt is not intelligent, the CPU needs to search through a variety of flags or status bits to determine what action to take. Using peripherals with intelligent interrupts can make a big difference in cases where timing budgets and latency requirements are most aggressive.

Some MCUs take this approach one step further and eliminate the interrupt completely for some operations. The Energy Micro (now part of Silicon Labs) EFM32GZ family includes a special Peripheral Reflex System (PRS) that can be used to implement many common interrupt functions by allowing fast and autonomous communications between peripherals. This can eliminate the need for an interrupt to the CPU for simple housekeeping functions since events from one peripheral can be used as input signals or triggers by other peripherals. These signals are selected and routed via one of four configurable interconnect channels. Outputs from producing peripherals (those generating an event) are routed to consumers (peripherals triggered by an event) and adjusted for level or rising/falling edge sensitivity.

An example use of the PRS is illustrated in Figure 2. The timer can be used to trigger the start of the ADC conversion and the ADC conversion completion signal can be used to trigger a DMA transfer. The DMA completion signal, in turn, can be used to reset the timer to start the sequence over again. No CPU intervention is required and no interrupt need be generated. Note that an additional counter could be added to the PRS used to wake the CPU after some number of measurements has been made (perhaps 1,000). The CPU could then process all 1,000 samples at once to further improve processing and power efficiency.

Figure 2: Silicon Labs EFM32GZ family peripheral reflex system example (Courtesy of Silicon Labs).

Multicore MCUs create intelligent peripherals

The ultimate offload engine for a high-performance MCU is a coprocessor that can manage the peripheral I/O functions with complete independence. The recent move to multicore MCUs, such as with the NXP LPC4370FET100E, allows the designer to create a completely independent channel controller, dedicated to peripheral control. In fact, the NXP LPC4370 has three CPU cores: the main ARM Cortex-M4 CPU, a coprocessor-oriented ARM Cortex-M0 CPU, and a special peripheral-control-oriented ARM® Cortex™-M0 CPU. Figure 3 shows that the peripheral-oriented CPU (in the upper left side of the block diagram) is part of a peripheral subsystem that includes an AHB subsystem bus matrix, an SPI port, subsystem GPIO, and local SRAM memories. A core-to-core bridge connects the subsystem to the rest of the device through the main AHB bus matrix. The peripheral subsystem has all the hardware needed to manage peripherals independently, and in some cases can be the only CPU active, with the other CPUs in low-power states to improve power efficiency.

Figure 3: NXP LPC4370 family block diagram (Courtesy of NXP).

However, intelligent peripheral control need not stop there. In fact, the second ARM Cortex-M0 CPU can also be used for peripheral control, perhaps for the analog DAC and ADC subsystems or as an intelligent motor control peripheral. This layering of intelligent peripheral control makes it possible to enable only the subsystem required; the main CPU during high-performance data-processing functions, the low-speed intelligent peripheral interface during command processing over the SPI port (with the rest of the device in a power-down mode), or the higher-speed intelligent peripheral controller during motor control or analog operations. The possibilities for independent operation are wide ranging when multiple cores are available to create independent intelligent subsystems, and they can be more easily customized for the particular needs of the application.

Do not overlook intelligent analog

It might be easy to focus on the digital peripherals and overlook the new features available in analog peripherals that also give them increased levels of intelligence. The analog-to-digital converter (ADC) included on advanced MCUs like the Renesas RL78 family is capable of independent operations, similar to those described for digital peripherals like serial ports. For example, intelligent ADCs can be configured to make periodic measurements when triggered by a hardware timer, completely independent from the CPU. Captured values can be stored sequentially into memory using a DMA function, and the CPU need not become involved until enough measurements have been made to require processing. In digital signal processing (DSP) applications, it might take a thousand measurements before processing is required. During this time the CPU can be doing other functions, or can be put into a low-power sleep mode and a timer interrupt used to wake the CPU when enough samples have been acquired. It is easy to see that both processing and power efficiency are much improved over an implementation where the CPU needs to be used to capture and store each ADC measurement.

You might think that this level of intelligent, autonomous operation is enough, but as they say in low-budget late-night TV commercials, “Wait, there’s more!” The Renesas RL78 ADC also has a windowing function that can be used to further improve autonomous operation. This function allows the programmer to define a low- and a high-level threshold (the window) for the captured ADC values, as illustrated in Figure 3. If the captured value is outside the defined thresholds, an interrupt can be generated (if the ADRCK control bit is set to a “1”). Note that if the reverse window is desired, an interrupt can be generated if the value falls within the window. This feature allows for a fast response if the analog value begins to wander outside the acceptable range. Without this level of intelligence it would take until the end of the full data set (perhaps a thousand measurements) is captured and then a significant number of CPU cycles scanning the entire data set to determine if the value has started to move outside the acceptable range. If measurements are made every 10 µs and 1,000 measurements are taken each time, the worst-case response to a threshold violation would be over 10 ms (not including the time for the CPU to scan the entire data set, burning power the whole time). Clearly the use of a windowing function, like that available on the Renesas RL78 can save significant processing cycle time and power dissipation.

Figure 4: Range settings for the Renesas RL78 ADC windowing function (Courtesy of Renesas).

Intelligent use of low-power modes

It is important to note that the ability to put inactive CPUs into a low-power mode can be a key technique to further improve power efficiency. A recent TechZone article, "Using MCU Power Management Options to Optimize System Efficiency," provides an excellent resource to better understand the wide variety of low-power modes available, so we can forgo a detailed discussion here. Our key takeaway point about low-power modes is that intelligent peripherals, due to their autonomous operation capability, provide many opportunities to put CPUs into low-power states, “saving” them for the complex data-processing tasks they are best at. When low-power modes are used in conjunction with intelligent peripherals the resulting improvements in power and processing efficiency can be dramatic.

To sum up, MCUs have developed several autonomous functions that can be used to offload low-level processing tasks for managing peripherals and their associated data transfer functions. New multicore MCUs provide even more opportunities to create and use intelligent peripherals, targeted to the specific needs of the application. The use of intelligent peripheral subsystems can dramatically improve both processing and power efficiency when properly integrated into an MCU-based application. Do not overlook these opportunities in your designs.

For more information on the MCUs discussed here, use the links provided to access product pages on the Hotenda website.