Processors Offer Sophisticated Graphics Support for an Increasingly Visual World

Just conveying plain text is not good enough these days. To entice us and keep our attention in a world of overly stimulated senses, any display of data must now appeal to our aesthetic nature. As a result, even basic information must be presented in vividly rendered colors with complex shading, animation, and video.

However, high resolution, deep color palettes, real-time animation, and video all take a toll on a processor’s ability to construct, render, and display pages. We expect to see clean, flicker- and artifact-free pages. As display resolutions get higher, however, the amount of content, background processing, and data movements for each page, and the ability to render them takes more and more of a processor’s resources, especially time and memory.

This article looks at high-end processors with enough horsepower, resources, and architectural streamlining to support higher-end graphics. These typically employ 32-bit or wider data paths internally and use high-speed clocking. Internal high-speed cache RAM, as well as interleaved DMA are important here, too. All parts, data sheets, tutorials, and development kits referenced here can be found on the Hotenda website.

Requirements and choices

Most of us are not so fortunate that our designs will be produced in the millions. As a result, ASICs often are not a cost-, time-, or risk-effective option, and we are left specifying higher-cost FPGAs or a meaty enough processor that can handle graphics while still performing other system functions.

Video today is not a trivial task. To format data, arbitrate sprites, perform overlays, and do scrolling, color blending, and more, involves process- and data-path-intensive tasks. Three dimensional (3D) rendering and shading adds another layer of expected functionality that eats up processor resources and time.

While some high-performance processors can be programmed to be a dedicated graphics peripheral processor, these parts typically can handle many additional tasks. What’s more, the high-end peripherals they contain can include a projected capacitance-touch interface, as well as stereo sound and Ethernet and other communications protocols to allow the application processor to grind away with the fewest interruptions. It is almost like having another core. A high-performance DMA and external bus interface logic also are usually present, as is a lot of general-purpose I/O.

A typical tablet can have from 1,024 x 600 to 2,560 x 1,600 pixels in its TFT display. This wide range imposes severe constraints on a system’s performance and especially on the amount of memory resources needed (Table 1). With a full 24-bit color palette, a 2,560 x 1,600 display such as those used in some Samsung Galaxy, Amazon Kindle, and Toshiba tablets will eat up over 12 Mbytes of RAM for a single page.

Table 1: Typical tablet displays vs. page memory.

Even with a 32-bit wide data bus, 368,640,000 single-cycle 24-bit memory transfers must occur every second for a full redraw of a 30-frame-per-second (fps) video stream. That is 2.7 ns per transfer, a lot to ask from a process that is rendering and moving data. Even if you are not doing real-time decoding of compressed video formats such as MPEG, rendering is not as simple of a task as you may think.

Another factor is internal high-speed cache RAM and external bus interface speeds. At high clock speeds, wait states on external bus RAM will starve the processors and degrade performance. A good SDRAM interface on your processor’s external bus interface can mean pages can be refreshed in the background, allowing the processor to focus full bore on the rendering.

A high-performance processor can be dedicated to graphics control, especially if it can move data in and out very quickly. One example of a well-implemented high-speed SDRAM interface can be found in the Freescale MPC8245LVV333D, which centers on the company’s Power PC MPC603e core and is part of the MPC82xx series. This 32-bit, 352-pin processor clocks in at 333 MHz, and some family members can go even faster. The processor is available in 2, 1.8, and 1.5 V versions, which can help reduce power in this 4.5 million transistor device.

The MPC8245 combines Power PC architecture with a PCI bridge so that designers can rapidly design systems using peripherals designed for PCI and the other standard interfaces. Its core can operate at a variety of frequencies, allowing the designer to trade off performance for power consumption. SDRAM syncing and driving are handled independently by the peripheral control unit, which also interleaves DMA functionality into the external bus interface logic, allowing autonomous high-speed data transfer to displays and leaving the processor’s horsepower free to render in the background (Figure 1).

Figure 1: A streamlined processor block coupled with an advanced high-speed external memory interface is a good candidate for a dedicated graphics processor and can act as a high-end semi-autonomous peripheral. It may be all that is needed for some applications.

Note the SDRAM interface supporting up to 2 Gbytes of SDRAM. The high-bandwidth bus can use 32- or 64-bit transfer cycles. The integrated DMA controller allows scatter gathering operations and supports DMA chaining, which automatically links DMA buffers. This eliminates yet another task from the main micro’s agenda.

TI, a longtime developer and pioneer of digital imaging and graphics, offers serious contenders with its OMAP35x processors and tools. The OMAP platform combines a powerful 600 MHz superscalar ARM® Cortex™-A8 core integrated with four possible OMAP applications processors.

Of particular interest is the OMAP3530, touted as having best-in-class video, image, and graphics processing, and providing direct support for streaming video, 2D/3D mobile gaming, and video capture. The OMAP3530 contains a graphics accelerator and dedicated video input and video output ports.

Especially notable are the 64-channel DMA support and the low-power DDR interface. As with other processors, RAM takes up a lot of die area, and in this case the 64K general-purpose RAM will typically hold a scan line or two at a time. Also helpful is the availability of up to 256 Kbytes of level-2 cache RAM on-chip, which can hold templates and some background graphics information. Also take notice of the fact that another 96 Kbytes of RAM is available for DSP rendering and usage. A hardware-based graphics accelerator is included on the 3530 as well.

Both TI and third-party development tool makers support these parts. The BeagleBoard from Circuitco Electronics supports the OMAP3530 and illustrates connections to SDRAM, S-Video, and DVI-D interfaces, as well as all LCD interface signals (Figure 2). A BeagleBoard Product Training Module is available online at Hotenda, as well as a video illustration of how to use BeagleBoards running Linux to drive a pico projector, which may be a video interface of interest.

Figure 2: The BeagleBoard takes advantage of OMAP processor architecture and serves as a powerful test and development platform, especially for embedded Linux designs. It is well suited for display driving and pico projectors.

Logic PD makes the SOMOMAP3530-11-1782JFIR a development and evaluation platform supporting the OMAP3530 and combining the ARM Cortex-M8 processor with a TMS320C64x DSP processor. While this is geared more for signal processing development, it is also a usable platform development tool.

Multi-core alternatives

A multi-processor chip may be a good alternative to implementing a multi-processor board. While there may be some bandwidth limitations when sharing the same memory or peripheral buses, multi-core processors have demonstrated their ability to perform when tasks are partitioned efficiently.

A widely known and supported multi-core technology family comes from Freescale with its powerful family of ARM 8- to ARM 11-based i.MX 6 series of processors with scaled flavors up to a quad-core part at 1.2 GHz with 64-bit DDR 3 and two 32-bit DDR 2 interfaces.

A Freescale i.MX6sololite part suitable for 2D and 3D graphics, the MCIMX6L3DVN10AA, is a 432-pin ROMless processor with a single 1 GHz core. Of note is the 256 Kbytes of on-chip RAM and a power supply that runs down to 0.95 V.

These parts are scalable and are made with multimedia and graphics in mind. A dedicated Hardware Graphics Accelerator block performs vector, 2D and 3D graphics and unburdens the processor from these data-intensive operations. Another dedicated block designated Image Processing handles enhancement, inversion, rotation, scrolling, resizing, and blending, to name a few functions. Still another dedicated hardware block interfaces to cameras and displays (Figure 3).

Figure 3: Best performance can come when refined and dedicated hardware blocks can take advantage of high-speed buses and deep memory pools. This can provide ASIC-like levels of performance at lower cost with more integrated peripherals.

Freescale’s larger 624-pin MCIMX6S5DVM10AB is also a ROMless design with dedicated graphics hardware and an ARM Cortex-A9. Dual and quad cores are higher up the food chain and absorb functionality that single cores just cannot handle in real time. For example, the i.MX6 family also includes quad-core 1.2 GHz parts such as the MCIMX6Q5EYM10AC. These parts also feature dual 2D graphics engines, as well as 3D support with four additional shaders.

In summary, while an ASIC may be an effective solution to providing sophisticated graphics and video, most designers cannot afford the cost or time of specifying one. Fortunately, as this article has demonstrated, there are a variety of good high-end processors available with well-thought-out video capabilities and features to do the job.

For more information on the parts mentioned here, use the links provided to access product information pages on the Hotenda website.