Give away medical masks when you place an order. learn more

Designing with Image Sensors: Acquiring a Clear Picture

While your counterparts in marketing might be chanting a "more megapixels are better" mantra, resist the temptation of that siren song. They will often be wrong.

Not so long ago, the hottest debate in the imaging world was if (and if so, when) systems designers should transition from legacy CCD image sensors to then embryonic, CMOS-based successors. Fast-forward just a few short years and the technology transition is nearly complete. All but the most demanding applications – in terms of resolution, frame rate, low-light sensitivity, etc. – have enthusiastically embraced the CMOS upstart. Hotenda's supported product line reflects this reality, being flush with CMOS imaging sensors from companies such as Aptina LLC and OmniVision Technologies.

CMOS sensors' ascendance is only the latest case study of the inevitability of Moore's Law. CCD sensors' heavy analog circuit dependency means that they can only be manufactured on specialty processes supported by a short list of foundries and captive-fab IC vendors. Over time, those specialty processes have also been increasingly unable to keep pace with the lithography progression of their commodity CMOS counterparts, resulting in considerably higher cost-per-pixel. Finally, CMOS sensors' conventional process foundation enables the optional integration of memory and logic processing blocks alongside the pixel circuits.

Sensor shortcomings

Just as things are rarely black-and-white in the engineering world, the transition to CMOS sensors has proven to be a mixed blessing. CMOS sensors display higher "dark current" than CCDs, resulting in poorer low-light performance and more restricted dynamic range. Also, CCDs directly output a per-pixel accumulated charge, which off-chip processing translates into corresponding voltage measurements and transforms from the analog to the digital domain.

Conversely, CMOS sensors embed the charge-to-voltage function blocks, along with amplification, A/D conversion, and other circuits, thereby negatively impacting the per-pixel fill factor — the percentage of each pixel's total silicon area devoted to photon capture versus other functions (see Figure 1).

Figure 1:The increased on-die peripheral circuitry of a CMOS sensor leads to a lower fill factor (percentage of the die is devoted to photon capture versus other functions) than with the CCD precursor. (Courtesy of Eastman Kodak).

Microlenses and other similar-function structures located above the pixels, along with advanced pixel design and interconnect techniques, can counteract fill factor shortcomings to some degree (see Figure 2). However, they are not a perfect solution, and their inclusion negatively impacts sensor manufacturing costs when compared against, for example, a microlens-less alternative. Backside illumination, a design technique that relocates active matrix transistors and their interconnect traces below the pixels' photosensitive layers, also improves low ambient illumination results.

Figure 2: Microlenses, located above the image sensor pixels (a) in an array (b), reduce but don't eliminate fill factor-related low-light performance shortcomings (c), which become more acute as pixel dimensions decrease. (Courtesy of Eastman Kodak).

Moore's Law trends mentioned earlier — trends that enable ever-higher levels of circuit integration over time on a given-sized sliver of silicon — generally have been friendly to semiconductor-fabricated devices. However, the news has not been as encouraging for image sensors. Shrink the pixel dimensions in search of ever-higher pixel counts and, fill factor shortcomings aside, you inherently also reduce the number of photons that each pixel's sensor can capture and accumulate in a given period of time. Looking at Aptina's offerings in Hotenda's product catalog, for example, you will notice that while VGA resolution sensors have pixel dimensions of approximately 6 x 6 μm, Aptina's 5 MP sensors reduce the per-pixel size to ~2.2 x 2.2 μm, with higher-resolution offerings shrinking to an even smaller 1.67 x 1.67 μm pixel metric.

Other image sensor vendors' products have made similar pixel dimension reductions as they climb the resolution ladder. If you attempt to counteract the reduced photon count trend by amplifying the signal coming out of the pixel, you also amplify the noise. The result, as a cursory perusal of online consumer feedback databases will quickly reveal, is abundant user frustration with newer and supposedly ‘better' cameras that deliver poorer overall images than their lower-resolution predecessors.

Memory, processing burdens

Noise and other egregious artifact-filled snapshots are not the only themes you will encounter at's feedback pages and other like-minded sites. Another common complaint involves slow shutter button press-to-image capture speeds, along with related long snap-to-snap latencies — the still image equivalent of low frame rates in video capture. Both factors are fundamentally related to increased per-picture resolution, therefore leading to an increased per-picture processing burden. Simple math highlights the root cause — the 5 MP (~2592 x 1944 active pixel) sensor has more than 16 times the pixels of its VGA (640 x 480 active pixel) sibling, requiring more processing horsepower to suppress high-resolution induced noise.

Moore's Law trends influence image processors' capabilities over time, too. Logic integration and clock speed improvements to some extent have counterbalanced sensor resolution-driven increased processing demands. Fortunately, image compression and other processing tasks are particularly amenable to parallel-processing techniques. Nonetheless, all other factors being equal, a design based around higher-resolution still images (or frames, in the case of video) demands a larger, more expensive, and more power-hungry image processor than does a smaller-resolution alternative design approach.

Keep in mind, too, that the high-priced processor will not be the only IC that negatively impacts the total bill of materials cost. More pixels demand more RAM to hold them (and intermediary copies of them) during the sequence of processing functions between initial capture, final archive, and eventual transfer. More and larger memory devices mean more cost incurred, more power burned (including periodic refresh, in the case of DRAM), more interface pins needed, and more board space consumed. Conversely, with low-resolution sensors' outputs, you might even be able to leverage the embedded RAM array within a SoC instead of needing to rely on standalone memory devices.

Storage and transmission demands

After processing is complete, the resulting still image or video frame sequence is frequently stored for some period of time in resident nonvolatile memory before being moved to a HDD, optical disc, magnetic tape, or flash memory. Uncompressed image archiving requires not only very fast storage write speeds — especially when high resolutions are involved — but also daunting storage capacities for reasonable still image counts and video capture times. Therefore, lossy compression frequently finds use in shrinking the per-frame payload while preserving an acceptable approximation of the original lossless images.

Still, even after factoring in lossy compression's often significant byte count reductions, the larger the resolution of the source content, the larger the compressed version of that content. Generational evolutions in lossy compression algorithms, such as from MPEG-2 to MPEG-4 or VC-1 for video, or from JPEG to JPEG XR or WebP for still images, conceptually allow for higher pixel counts with little to no appreciable increase in resultant file size. However, a one-generation algorithm increment often requires a near-exponential increase in required processing power and temporary memory footprint. The more aggressively one compresses an image in pursuit of a particular file size aspiration, the more likely viewers of the resultant material will discern distracting artifact errors.

Newer compression algorithms are also, by their very nature, less widely supported than their more mature predecessors, and their complexity tends to result in incompatibilities even within supposedly ‘supported' systems. MPEG-4, for example, encompasses a diverse assortment of Parts and Levels. The variant most commonly called ‘MPEG-4' — alternatively also known as H.264, AVC, and JVT — in casual terminology, for example, is strictly speaking MPEG-4 Part 10, which still consists of 17 different Profile versions. Conversely, by using a lower-resolution image sensor, you have the ability to capture the same number of pictures (or the same video runtime) as before, but you can instead leverage a less complex and more widely supported algorithm, such as MJPEG.

Locally archived content must, considering non-infinite storage capacity, sooner or later relocate elsewhere, in which case a shift in perspective from stored bytes to transferred bits is necessary. In the case of ‘live' streaming applications, such as video conferencing, temporary nonvolatile storage use is nonexistent. Modern wired protocols, such as USB 2.0, FireWire 400, and 100 Mbps Ethernet, thankfully deliver sufficient bandwidth for many still image and video transfer applications. Technology successors such as USB 3.0, FireWire® 800, GbE, and the Intel-championed Thunderbolt™ (formerly LightPeak) extend this success into the high-definition era, in some cases by migrating from copper to optical fiber as the physical interconnect medium.

The bandwidth situation is not nearly as upbeat with wireless local network technologies, nor is it as sanguine when the data being transferred extends beyond the LAN to a wired or wireless WAN tether. Peak bandwidths in these cases are an order of magnitude or more lower, with reliable sustained speeds lower still.

Economic-driven factors also beg for consideration. Broadband and cellular service providers desire that users not swamp their networks leads to bandwidth caps and expensive overage charges, along with intermediary bandwidth ‘throttles' triggered by usage at heavy network load timeframes, especially if it exceeds poor-at-best documented thresholds.

The typical user will not understand — and for that matter should not need to understand — the underlying reasons for these setbacks. All the user will know, for example, is that pictures take a long time to upload, video stutters, and the monthly service bill is substantially higher than it should be. All of these are compelling reasons why you should give serious consideration to include a much lower resolution image sensor in your design than a ‘more pixels are better' simplistic perspective might otherwise suggest.

Application requirements

Honestly ask yourself and your marketing counterparts just what image resolution is necessary to satisfy the users' visual quality expectations. The market has proven that the ‘moving pictures' nature of video relaxes per-frame resolution requirements as compared to a still image that, by its very nature, is amenable to more intense audience scrutiny.

After all, even an entry-level VGA sensor is capable of supporting the 480-line resolution requirements of the DVD video format.

Technology veterans may also remember the postage stamp-size video delivered by initial iterations of Apple QuickTime®, Intel Indeo™, and other early video playback standards. Even the highest resolution variant of the United States' ATSC digital television standard only requires, at 1920 x1080 pixels, 2 MP per frame – or said another way, 1 MP per 1080i interlaced field.

Even with still images, your users probably require fewer pixels than they think, especially if they also desire other attributes such as excellent low light quality, rapid snap-to-shoot and snap-to-snap speeds, and fast picture offload rates. Even if you assume that users are printing out the pictures at a relatively high quality 300 dpi setting, a 5 MP sensor will generate an interpolation-free 6" x 8" output shot.

More common lower dpi printing will enable even higher native resolution pictures from a 5 MP source. You can still generate high quality snapshot photos even if you aggressively crop the original image. Now, extrapolate and consider how large the high-quality pictures coming from a modern 14 MP point-and-shoot camera can be. Modern pixel interpolation techniques can extend these capabilities even further. Is it any wonder that an increasing number of technology analysts and journalists, not to mention knowledgeable users, are now claiming that the megapixel ‘race' is over?

Apple's approach

Apple's iPad® 2 tablet, along with the company's previously unveiled iPod® touch and iPhone® 4 siblings, provide an interesting case study of how design teams balanced contending and often contradictory trade-offs in defining and developing imaging subsystems (see Figure 3). The iPhone 4, made available for sale in June 2010, was the first iPhone family iteration to include a front-mounted image sensor intended for video conferencing applications such as Apple's FaceTime®.

The iPhone 4 front sensor's VGA resolution is capable of capturing standard-definition still images along with 480p video at up to 30 fps. The companion back-mounted image sensor, intended for conventional image capture applications, supports 5 MP still photos and 720p, 30 fps high-definition video, both representing upgrades from priorgeneration Apple handsets, but undershooting the capabilities of standalone cameras (see Table 1). Yet the iPhone 4 recently achieved a notable metric when it became the most popular source of images hosted on the Flickr® photosharing site.

Figure 3: The iPad® 2 and fourth-generation iPod® touch® introduced image capture capabilities to both platform families for the first time. (Courtesy of Apple).

Product Front camera Front camera (video) Rear camera (still) Rear camera (video)
iPhone N/A N/A 1600 x 1200 N/A
iPhone 3 N/A N/A 1600 x 1200 N/A
iPhone 3S N/A N/A 2048 x 1536 480p 30 fps (constrained by Apple; the sensor is 720p-capable)
iPhone 4 640 x 480 480p 30 fps 2592 x 1936 720p 30 fps

Table 1: Apple iPhone family variants and image capture capabilities.

The fourth-generation iPod touch®, unveiled in September 2010, was the first version of this particular product line to include an image sensor — two, actually. The front-mounted sensor spec mimics that of the iPhone 4, again with FaceTime and its video conferencing ilk in mind. However, the back-mounted sensor is substantially resolution-capped as compared to the one in the iPhone 4 of three months before, by ‘only' supporting 0.7 MP (960 x 720 pixel) still images, along with 720p 30 fps video. The same iPod touch sensor suite also found its way to the iPad 2 unveiled in March 2011, whose first-generation iPad predecessor provided no image capture capabilities.

Arguably the iPad 2's back-mounted image sensor may have been included primarily for competitive positioning reasons. Alternative tablets such as the Android™-based tablet, Motorola Xoom™ offer rear-facing cameras with still image resolutions up to 5 MP, even though scant-at-best evidence exists that any users are using any tablets as still and video camera surrogates.

Wrapping up

Apple's engineers clearly resisted the temptation to include the highest available resolution image sensors in their latest system should you, too, in many cases:

  • If your system will be used in low ambient light environments
  • If you have stringent bill of material cost requirements
  • If your design is space-, weight- or power-consumption constrained
  • If the system contains limited resident nonvolatile storage, and especially if capacity cannot be augmented via expansion slot-based memory cards, and/or
  • If the wired or wireless connectivity between the system and the outside world is bandwidth-limited or -capped, either for file transfer or live-streaming purposes.

On the other hand, if your target customers will be making large-format prints of the images they capture, if they'll be employing ‘digital zoom' techniques to capture only a portion of the image presented to the sensor, or if they'll aggressively crop the source image post-capture and prior to archive, then consider a higher resolution sensor than you might otherwise incorporate.