As a ‘sea of gates’, FPGAs provide an almost limitless platform for any digital functionality that can be implemented using logical expressions realized through Look-Up Tables of various widths. Since their inception, this has offered an unprecedented level of flexibility, while their uniformity and array-like structure has made them an early adopter of the latest fabrication nodes.
Despite being constantly at the leading edge of semiconductor technology, performance can still be a limiting factor in an architecture that relies on increasingly complex routing patterns and clock structures; and it is for this reason that vendors first implemented critical functions using hard-wired blocks of Intellectual Property (IP).
The inexhaustible demand for greater performance and optimal power consumption has driven FPGA vendors to integrate an increasing amount of fixed-functionality IP into their products. While stalwarts may see this as an erosion of the core benefits of FPGA technology, in truth it provides access to complementary functions that would be difficult or untenable to implement in the logic fabric.
One of the core markets for high-end FPGAs has always been telecommunications; here, performance is key and cost can be secondary. To meet this demand, FPGAs now integrate a large amount of IP targeting the data path, such as high-speed Ethernet, as well as other serial transceivers and, most recently, optical interfaces. These sit beside other performance-critical functions such as PLLs and DSP blocks. The ability to add a soft-core processor to handle control path functions means large FPGAs are still very popular in this field, often being preferred over ASICs.
While soft cores can tackle many tasks, particularly in the latest and fastest FPGAs, they remain limited and may not be suitable for other tasks where performance is important. Consequently and somewhat naturally, the extension of IP integration saw the introduction of ‘hard’ processor cores; a concept that has taken several iterations to fully realize but is now making a significant impact in many application areas.
Early attempts to embed hard processor cores in FPGAs failed to excite the industry or revolutionize the technology: too much, too soon, perhaps. The industry as a whole fell back on the soft-core option but it soon became clear that there really was a market for the embedded processors; it just needed the right solution. Arguably, that turned out to be ARM.
Today there are a number of FPGAs that offer embedded ARM cores, as well as soft-core options, and the reasons are clear; performance is guaranteed. Furthermore, the ability to extend a hard-core with hardware-acceleration in the FPGA fabric has opened up even more application areas, which FPGA vendors are now keen to explore.
Conceptually, FPGA vendors refer to the integration of logic fabric and hard IP as a System-on-Chip (SoC) approach, a term used extensively by processor vendors who integrate other functions. This includes processors that offer configurable logic, such as the PSoC family from Cypress. In terms of performance, embedded processor sub-systems are not just for the very high end; Altera now offers an ARM-based SoC solution in its high-performance Stratix, its mid-range Arria and low-cost Cyclone families.
From a system level, the inclusion of a processor sub-system in an FPGA does not need to dominate it physically. Figure 1 shows a representation of Altera’s Arria V, which clearly shows the Hard Processor System (HPS) based on the ARM Cortex-A9 MPCore sub-system occupying only a small part of the silicon. This is reflected in Figure 2, the block-diagram of the low-cost Cyclone V, which also integrates the HPS.
Figure 1: Altera’s Arria V family of mid-range SoCs features the vendors Hard Processor System (HPS).
Figure 2: Altera’s low-cost Cyclone V also features its HPS, providing either a single- or dual-core ARM Cortex-A9 sub-system alongside its FPGA fabric.
In this configuration, each core includes 32-kbyte of L1 instruction cache and 32-kbyte of L1 data cache, a single- and double-precision floating-point unit and NEON media engine, with CoreSight debug and trace. A further 512-kbyte of shared L2 cache is also integrated, along with 64-kbyte of scratch RAM. A range of memory and general purpose interfaces are included — up to 134 general purpose I/O. Importantly, the HPS and the FPGA can operate independently but remain tightly coupled through a high-bandwidth system interconnect built using ARM’s AMBA AXI bus bridges, which give the HPS access to the FPGA fabric and vice versa. Both bridges are AMBA AXI-3 compliant. A dedicated 32-bit configuration port allows the HPS to configure the FPGA at start-up.
Variations on a theme
Like Altera, Xilinx has also adopted the dual-core ARM Cortex-A9 MPCore sub-system approach to its SoC offering, but unlike its competitor, instead of supporting three families at different performance/price points, it’s chosen to implement a range of performance levels across a single family; the Zynq-7000.
Figure 3: The Zynq-7000 family from Xilinx also offers a dual-core ARM Cortex-A9 sub-system with a range of FPGA fabric density options.
Like Altera’s Cyclone V, Zynq-7000 also supports full or partial reconfiguration, allowing part of the FPGA to continue to run while the remaining part is being reconfigured, without stopping the overall system. The processor sub-system is also capable of running independently of the FPGA logic.
While the previous two examples use a ‘classic’ SRAM approach to the FPGA fabric, which need their configuration data loaded at power-up, there are also Flash-based examples of SoC devices integrating a hard-core processor which offer more of an ‘instant on’ use-case, such as the SmartFusion2 family from Microsemi.
Figure 4: Block diagram of Microsemi’s SmartFusion2 SoC FPGA.
Figure 5: The ARM Cortex-M3 sub-system of Microsemi’s SmartFusion2.
Unlike the other SoCs featured here, the SmartFusion2 family really addresses low power, evident by the choice of a Cortex-M3 core, which is predominantly associated with microcontroller applications, as opposed to the Cortex-A class favored by Altera and Xilinx, which is more appropriate for Applications Processor-type use-cases. At this level of integration, the SmartFusion2 starts to resemble other solutions that are not typically considered an FPGA, such as the PSoC 5 family from Cypress; however, with an FPGA fabric of up to 150,000 logic elements, the SmartFusion2 family retains its FPGA credentials.
The integration of FPGA and processor core technology has a long and eventful history. Initially showing little promise, it was perhaps in danger of being written off completely. Fortunately for all developers, the FPGA vendors showed a level of vision that ensured its revival, helped significantly by the success of soft cores.
Now the embedded hard core is firmly ‘embedded’ in the engineers’ toolbox of solutions and for many applications it really can represent a complete system-on-a-chip. It doesn’t stop there: Altera has already announced its next-generation Stratix 10 family which will integrate four 64-bit ARM Cortex-A53 cores and be constructed using Intel’s Tri-Gate technology, giving it truly unprecedented levels of performance.
Like its past, it looks like the future of the FPGA SoC will be just as eventful, but probably much more successful.