Using MCUs in Signal-Processing Applications

As embedded applications increasingly handle voice, video, and other high-speed data, the need for digital signal processing has become a major challenge for inexpensive, low-power MCUs — not to mention embedded developers. MCU manufacturers have added DSP functions to some MCUs to expand their capabilities, creating a class of MCUs referred to as digital signal controllers (DSCs).This article points out the difficulties presented by DSP algorithms and examines a number of DSCs that are up to the challenge.

The need for speed

Almost all MCUs can handle basic control applications, but processing high-speed audio and video signals in real time is far more computationally intensive. DSP involves the algorithms and techniques used to manipulate these signals after they have been converted into digital form. Different algorithms are needed to enhance images, recognize or generate speech, or compress data for transmission or storage. Implementing them entirely in software would be unacceptably slow, so some sort of hardware acceleration is needed.

Even if a general-purpose MCU could handle high-speed signal processing, its power budget would take a serious hit. Adding specialized DSP chips or going to high-speed RISC processors, FPGAs, or ASICs could address both problems, but at considerably higher expense in terms of cost, board space, and design complexity. The embedded developer’s first line of defense is the proper choice of algorithm — finding one that is minimally compute-intensive for the task at hand; and then finding an MCU that is optimized to handle these algorithms.

Digital filters

Almost all signals require some sort of filtering, whether to reduce noise, remove unwanted signal components, or narrow the input waveform. The two basic types of digital filters are infinite impulse response (IIR) and finite impulse response (FIR). An IIR filter is a feedback system that generates output based on both current and previous inputs, as well as previous outputs. Implementing an IIR filter is less computationally intensive than an FIR filter. However, IIR filters have issues with stability, linearity, and sensitivity to quantization noise; they generally require 32-bit hardware to implement them efficiently.

FIR filters generate output based on only the current and previous inputs; they have no feedback loop and are inherently stable; they also do not have the problems with phase linearity and quantization noise that IIR filters do. FIR filters are simpler to design than IIR filters, but they are more computationally intensive, so designers need to watch the tradeoff between speed and power. FIR filters can be implemented in 16-bit fixed-point hardware and even, given a sufficiently fast CPU and multiply-accumulate (MAC) unit, an 8-bit MCU.

Data transforms

Signal processing often requires the rapid transformation of signals back and forth between the time and frequency domains. DSP-enabled MCUs can often handle these transformations without resorting to external DSPs or FPGAs. While the discrete Fourier transform (DFT) is widely used for spectral analysis, data compression, and data convolution, it is far too computationally intensive for MCUs; hence the popularity of the fast Fourier transform (FFT), which delivers the same results with far less effort (Figure 1). The FFT algorithm is a set of additions and multiplications that is well suited to the MAC.

N (number of input samples) 8 256 1024 8192
DFT (complex calculations) 64 65536 1,048,576 67,108,684
FFT (complex calculations) 12 1024 5120 53,248

Figure 1: DFT and FFT complex calculations for varying N (Courtesy of Silicon Labs).

The FFT is still a computationally intense algorithm, especially as the number of samples n becomes large. A large speed saving can often be realized by first optimizing the algorithms; for example, removing unnecessary calculations and zero sums. Then look for an MCU with single-cycle MACs, barrel shifters, and other hardware DSP optimizations that enable high-speed signal processing.

A few examples

Silicon Labs highlights the ability of its 8-bit C8051F12x and C8051F360 MCUs to be able to calculate the FIR filter algorithm in real time while still leaving ample CPU resources available for other tasks. The C8051F124 includes a 50 MHz/50 MIPS 8051 CPU core; 8- and 12-bit ADCs; a 12-bit DAC; 8/16-bit PWM; 256 bytes of RAM; and 128 Kbytes of flash memory. Each pass through the FIR filter requires multiplies and accumulates, for which the chips’ MAC engines were optimized. Silicon Labs supplies a FIR example with its DSP-enabled MCUs that takes advantage of a circular buffer and mirroring optimizations, the latter reducing data movement operations to the MAC by approximately 25 percent. For readers wishing to look into this further, Hotenda carries the Silicon Labs C8051F360-TB Target Board, which comes with example C source code.

Moving up to 16 bits, Microchip Technology DSPIC30F, DSPIC33E, and DSPIC33F DSCs can easily handle higher-order FIR filters. In fact, it was Microchip that coined the term “digital signal controller” in 2002 with the launch of its 6000 series DSCs; the company has since created a DSP-enabled MCU product line (dsPIC). All dsPIC MCUs incorporate integrated DSP capabilities including single-cycle 16 x 16 MACs; 40-bit accumulators; dual-operand fetches; saturation and rounding modes; fee libraries; and low-cost filter design tools.

The 70 MIPS dsPIC33EP256 incorporates a single-cycle MAC with dual-data fetch; two 40-bit-wide accumulators; a single-cycle, mixed-sign MUL plus hardware divide; 32-bit multiply support; 15 DMA channels; and up to seven PWM pairs with independent timing and 8.2 ns resolution. The dsPIC33E can handle some pretty complex filter applications (Figure 2). Hotenda carries Microchip’s product training module “Introduction to dsPIC33F Architecture,” which gives more detailed information. For those wishing to check it out themselves there is the DSPIC30F Demo Board, a development and evaluation tool supporting a number of dsPIC30F devices for sensor, motor control, and general-purpose applications.

Function Conditions* Execution Time @70 MIPS
Vector Dot Product N = 32 1.7 μs
Matrix Add C = 8, R = 8 3.1 μs
Matrix Transpose C = 8, R = 8 3.4 μs
Block IIR Canonic N = 32, S = 4 17.0 μs
Block FIR N = 32, M = 32 17.5 μs
Complex FFT** N = 64 55.6 μs
*C = #columns, N = #samples, M = #taps, S = #sections, R = #rows
**Complex FFT routine inherently prevents overflow.
1 cycle = 14.29 nanoseconds @ 70 MIPS

Figure 2: Example dsPIC DSP performance (Courtesy of Microchip).

Freescale’s 60 MHz 16-bit MC56F8367 incorporates numerous DSP features, including a single-cycle 16 x 16-bit parallel MAC; four 32-bit accumulators; 512 Kbytes of program flash; 32 Kbytes each of data flash and data RAM; two PWM modules with 12 outputs; and 12-bit ADCs with 16 self-calibrating inputs. The MC56F8367 aims to provide “32-bit performance with 16-bit code density” in automotive, industrial control, medical, power management, and home appliance applications.

The Texas Instruments TMS320F2802 is a 60 MHz 32-bit DSC with 32 K x 16 flash and 6 K x 16 SRAM. Targeting high-speed motor control and PFC applications, the TMS320F2802 includes a dual MAC that can perform 16 x 16-bit and 32 x 32-bit MAC operations; a high-resolution PWM module that can generate up to 16 PWM outputs; and a 12-bit ADC with 2 x 8 channel input multiplexer with a 160 ns (6.25 Msample/s) conversion rate. TI’s product training module “Motor Signal Chain Overview” explains how to use this and other motor signal chain components in typical applications.

While Renesas does not call its DSP-optimized RX621 MCU a DSC, it nevertheless meets all the qualifications: a 32-bit barrel shifter; 32-bit register-based MAC (up to 80-bit results); an 8-channel, 12-bit ADC; 512 Kbytes of program flash; and 96 Kbytes of RAM. The 32-bit 100 MHz RX621 includes a single-precision FPU that supports add/subtract/compare/multiply/divide and other DSP-intensive instructions. Numerous communications interfaces are provided (CAN, Ethernet, I²C, SPI, USB) to suit a wide range of applications. The Renesas “RX Core” Product Training Module explains the chip in more detail, and the Renesas RX62N Demo Kit lets you evaluate it in action.


Adding large single-cycle MACs, barrel shifters, and other hardware support for DSP instructions enables DSCs to take on relatively high-speed signal processing chores. By using DSCs, embedded developers can often avoid the cost of a more expensive processor or adding a separate data processing chip.