Divide and Conquer Works for Dual-Core MCUs, Too

The NXP Semiconductors LPC4350 is the first implementation of a 32-bit ARM® Cortex™-M0/M4 on a single chip. The M4 can devote itself to high-speed data-plane processing while the M0 handles lower-speed-control tasks. By properly partitioning application software, designers can leverage the chip’s heterogeneous multicore architecture to create applications that can simultaneously handle multiple tasks in an energy-efficient manner. Target applications include motor control, industrial automation, white goods, embedded audio, RFID readers, and power management.

This article examines the features and functionality of the LPC4350 and includes a hands-on review of the LPC4300 evaluation kit, demonstrating how to code, download, and debug programs. It also highlights the part’s unique State Configurable Timer (SCT) and Serial GPIO (SGPIO) interface.

Working ARM in ARM

The dual-core LPC4350 joins two code- and tool-compatible ARM processors that share the same buses and can work closely together. The 32-bit ARM Cortex-M4 incorporates a three-stage pipeline with separate local instruction and data buses, as well as a third bus for peripherals. It also includes an internal prefetch unit that supports speculative branching. The M4 core supports single-cycle DSP and SIMD instructions and includes an integrated hardware floating-point processor. In any given application, the M4 core does the high-speed heavy lifting.

The low-power Cortex-M0 core on the other hand, typically handles non-time-critical supervisory chores; it is designed as a low-cost 32-bit replacement for existing 8/16-bit microcontrollers. Like its big brother the M4, the M0 processor offers up to 204 MHz performance but with a simple instruction set and reduced code size.

Figure 1 shows the LPC4350’s basic architecture. The ARM Cortex-M4 includes three AHB-Lite buses, the system bus, the I-code bus, and the D-code bus. The I-code and D-code core buses allow for concurrent code and data accesses from different slave ports. A multilayer AHB matrix connects the ARM Cortex-M4 buses and other bus masters to peripherals in a way that peripherals on different slave ports of the matrix can be accessed simultaneously by different bus masters. This allows for a great deal of flexibility in application processing.

Figure 1: LPC4350 block diagram (Courtesy of NXP).

The Cortex-M0 core can smoothly offload work from the M4 core since most peripheral interrupts are connected to both processors. GPIO registers are located on the shared AHB bus to minimize latency. The two cores communicate with each other by using shared SRAM as a mailbox, with one processor raising an interrupt in the other processor’s Nested Vector Interrupt Controller (NVIC) to indicate that it has delivered a message. The other processor returns the favor to acknowledge receipt.

A unique feature of the LPC4350 is sixteen Serial GPIOs that offer standard GPIO functionality, enhanced with features to accelerate serial-stream processing. Each SGPIO I/O slice can perform serial-to-parallel or parallel-to-serial data conversion. Additionally, the slices are double buffered and contain a 32-bit FIFO that can shift the input value from a pin or the output value to a pin on every clock cycle.

Another unique feature is the State Configurable Timer (SCT), which can trigger a counter or set a timer based on a state variable such as a limit, halt, or stop condition. The SCT can be configured as two 16-bit counters or one 32-bit counter. You might use the SCT to change the clock speed of the CPU in response to a change in core temperature or after a certain number of external events. (For more information on the SCT, see the TechZone article, “Take control: How NXP's Patent-Pending SCT Improves Motor Control”.)

The SPI Flash Interface (SPIFI) lets the Cortex-M4 connect to low-cost, serial-flash memory with little performance penalty compared to higher-pin-count parallel interfaces. Using SPIFI, the M4 core can address the entire range of flash memory via processor or DMA channels at date rates of up to 40 MB/s.

Eval board features

The LPC4300 evaluation board (Figure 2) is built around the LPC4350FET256,551, with 65 MB SDRAM, 32 MB parallel flash, 512 kB SRAM, and a serial EEPROM. The board includes a wide range of I/O interfaces including CAN, UART, USART, Ethernet, USB (Host, Device, and OTG), HDMI, and audio IN and OUT. There are additional connectors for JTAG, an external power supply, a SIM card, a power MOSFET, and a serial port. If you want more, there is room to solder on an expansion header and a small breadboarding area¬, not to mention innumerable jumpers to give you control over the board’s numerous configurations.

Figure 2: LPC4300 evaluation board (Courtesy of NXP).

The software distribution that comes with the board contains a number of examples that run on the LPC4350 development system. In addition, it contains an ARM CMSIS DSP software library, which lets you exercise the Cortex-M4’s floating-point DSP capabilities.

One of the headers on the board interfaces with the Keil ULINK2 debugger running the Kiel μVISION4 IDE, and a limited version of the ARM Keil MDK Toolkit comes with the board. I used it to compile, download, and run over 50 example programs.

Getting started with a kit of this complexity is not exactly simple, but it is straightforward. First, download and unzip example files and flash drivers, then move them into appropriate directories. If you are using the Keil MDK, the driver library binaries have several configurations built-in, including building for internal SRAM (the fastest), SPIFI, and Hitex Flash (external parallel flash on the Hitex board). Set the jumpers as directed in the Getting Started guide, connect the ULINK2 debugger, connect to a PC using a USB cable, and you are ready to go.

Every evaluation kit ever designed seems to include a “blinky” program, and this one is no exception. In fact it has two, one runs under an RTOS and one without. Much more interesting is the Dual Core MBX (mailbox) Example, which shows how to independently control the Cortex-M4 and M0 cores.

In this example, the Cortex-M4 is the master and the Cortex-M0 is the slave, each core runs a separate instance of μVision. To get started, I double-clicked on the project file (M_Mo_ipc.uvmpw), which started μVision. I selected M0 as the active project (Figure 3), LPC43xx_M0_FLASH as the target, and rebuilt the project. Then I did the same thing for the M4.

Figure 3: Building the M4 and M0 projects.

After configuring ULINK2 and the flash programming utility, I was able to download and run the program. Basically this is yet another “blinky” program, albeit a sophisticated one, with the M4 controlling one LED and the M0 another. I was able to step through the code, set and remove break points, and study the interaction between the two cores. The exercise was informative and the code was a useful framework for more complex programs, any of which will benefit from drawing on some of the other example programs, the peripheral drivers in particular.


The LPC4350 is a unique chip that brings the advantages of heterogeneous multicore processing down to the MCU world. The ability to harness both cores in tandem may enable you, in some applications, to dispense with a context-switching mechanism or even an RTOS. The LPC4350 is a natural for deeply-embedded situations where a dynamic tradeoff between speed and power makes sense. If that describes a design you have in mind, consider checking it out using the LPC4300 evaluation board.