The use of virtualization is often thought of as an IT function that allows optimization of server computers. Some desktop computers also employ virtualization so they can run two operating systems. You might not know, however, that this technology is available for embedded systems. We’ll look at CPUs from Freescale, Intel, and ARM, after reviewing virtualization in general.
Following is a statement from Intel that does a good job of describing the advantages of virtualization: With virtualization, applications run in secure partitions that increase security by preventing unintended software interactions. The ability to have applications in partitions (virtual machines) facilitates software migration and consolidation, with (possibly) reduced software development effort and easing of system test demands. Embedded systems running a real-time operating system can also run a general-purpose OS, which typically provides a richer user interface.
Security is a key benefit. Everyone is, or should be, interested in security these days, especially if your embedded system is connected to the outside world. A network connection or even a simple serial port for configuration makes your system vulnerable to unauthorized, and possibly malicious, operation. Separating critical and not-so-critical code from one another, via virtualization, can have big benefits in reliability and may be required for safety. Virtualization can also help with certification issues, so if you change something in one part of the system, you do not have to requalify the other part.
The term "hypervisor" was first used in 1965, referring to software used on an IBM research machine, and on the IBM 360/65, allowing it to share its memory: half acting as an IBM 360 and half as an emulated IBM 7080. The software, labeled "hypervisor," did the switching between the two modes on a split-time basis. IBM’s reimplementation of CP-67 for the System/370 was released in 1972 as VM/370.
There are two versions of hypervisors:
- Type 1 (or native, bare metal) hypervisors run directly on the host's hardware to control the hardware and to manage guest operating systems. A guest operating system, thus, runs on another level above the hypervisor. Some versions are Oracle VM Server for SPARC, the Citrix XenServer, KVM, VMware ESX/ESXi, and the Microsoft Hyper-V hypervisor.
- Type 2 (or hosted) hypervisors run within a conventional operating system environment. With the hypervisor layer as a distinct second software level, guest operating systems run at the third level above the hardware. BHyVe, VMware Workstation, and VirtualBox are examples of Type 2 hypervisors.
Let’s now look at virtualization examples in embedded devices.
Freescale’s multicore QorIQ processors implement embedded virtualization capabilities, including a hypervisor privileged mode, guest interrupt injection, efficient guest translation, look-aside buffer management, and interpartition communication. The platform uses the virtualized cores to abstract peripheral and data path accelerators to support multiple OS’s and applications in a single device. In addition, debugging and trace hardware is virtualized to speed up development and system validation.
QorIQ processors implement the virtualization extensions to the Power ISA that can be found at Power.org. Power ISA 2.06 was released in February 2009 and included extensions for the e500 multicore regarding hypervisor and virtualization on single and multicore implementations for the embedded market.
The Freescale P4040/P4080 (Figure 1) four and eight core processors feature a unique, three-tiered cache hierarchy. Each core has an integrated Level 1 (L1) cache, as well as a dedicated L2 backside cache that can significantly improve performance. A multi-megabyte L3 cache is also provided for those tasks for which a shared cache is desirable.
Figure 1: The eight-core Freescale P4080 supports a hypervisor and virtualization.
The CoreNet coherency fabric manages full coherency of the caches and provides on-chip, point-to-point connectivity with concurrent traffic to and from multiple resources. This cuts bus contention and latency issues associated with scaling shared bus/shared memory architectures.
The P4040/P4080 e500mc cores, with advanced virtualization technology, can work as four symmetric multiprocessing (SMP) cores, or four completely asymmetric multiprocessing cores, or they can be operated with varying degrees of independence. Full processor independence, including the ability to independently boot and reset each e500mc core, is an important characteristic of the devices. The chips’ embedded hypervisor provides safe and autonomous operation of the multiple individual operating systems, allowing them to share system resources.
The 1.5 GHz P4080/4040 take less than 30 W, include high-performance data path acceleration logic, and deliver high-performance networking services. The devices offer two 10 Gbit/s Ethernet (XAUI) controllers, eight 1 Gbit/s Ethernet (SGMII) controllers, three PCIe V2.0 controllers/ports running at up to 5 GHz, and two Serial RapidIO 1.2 controllers/ports running at up to 3.125 GHz.
The P4080 and P4040 processors virtualization enhancements include:
- An MMU for separating privileged and non-privileged domains
- An IOMMU which provides memory protection for device to memory DMAs
- L3 cache partitioning that allows optimized allocation and configuration
- Coherence domains that allows cache coherency snoop traffic to be restricted to relevant cores only
- DPAA parse-classify-distribute functionality that allows port virtualization based on virtual MAC address, VLAN, or other classification methods
- A third (hypervisor) privilege level that allows the hypervisor to protect critical hardware resources with guest running in supervisor mode for reduced implementation overhead
- Extended virtual address space via a partition ID in virtual address that allows the MMU to contain translations from multiple partitions
- Direct interrupts for I/O — external interrupts can be sent directly to virtual machines without hypervisor mediation, for low latency
- Direct virtual machine access to machine status register that allows a guest OS to enable/disable interrupts without hypervisor mediation
- Direct system calls — user space to kernel system calls are direct without hypervisor mediation
- The P series does not have hardware logical to real address translation, which is available on Freescale’s e6500 core-based 64-bit processors
The ARM Architecture received virtualization extensions beginning two years ago for its Cortex A15 and A7 cores. The Cortex-A15 and Cortex-A7 both support ARM Virtualization Extensions to the ARMv7A architecture. This brings support for large physical address reach and hardware virtualization, as well as AMBA4 ACE coherency that enables big.LITTLE processing. Both the A15 and the A7 processors currently target low-power portable devices such as cell phones and are not used in other embedded systems. Texas Instruments, Broadcom, Nvidia, and Samsung use A15 cores in versions of their processors.
ARM’s Cortex R-series cores, which target real-time applications, do not have virtualization extensions at this time. The Cortex A series processors include security extensions known as the TrustZone technology. Primarily intended for retail security, Trustzone enables system wide security via an additional execution mode, the Secure Monitor Mode. Effectively, this can be used to implement a form of full virtualization, allowing for a single guest OS, and it is supported as such in commercial offerings like Green Hills Software’s INTEGRITY Secure Virtualization solutions.
In November 2005, Intel released two models of Pentium 4 (Model 662 and 672) as the first Intel processors to support VT-x technology. Today, hardware-assisted Intel VT and EPT (extended page tables) are built into most Xeon server platforms, as well as every Core i3/i5/i7 CPU. EPT hardware provides separate set of page tables that translate from guest-physical addresses to the host-physical addresses that are used to access memory. As a result, guest software can be allowed to modify its own IA-32 page tables and directly handle page faults. This removes a major source of virtualization overhead. Some Atom Z500 series, and all E600 CPUs, have VT-x (Figure 2), but none support EPT.
Intel vPro technology has hardware in the CPU that traps and executes sensitive instructions, relieving the hypervisor of these duties. Additional features on some processors and chipsets also reduce significant I/O performance bottlenecks. Hypervisor mode enables unmodified ring-0 guest operating systems to execute with reduced privilege. An example of this use would be preventing a guest OS from referencing physical memory beyond what has been allocated to the guest’s virtual machine without a hypervisor trap.
Figure 2: Example system with VT-x capable processor and chipset.
The VT-x extensions enable selective exception injection, so hypervisor-defined classes of exceptions can be handled directly by the guest OS, avoiding the performance overhead of hypervisor involvement on each interrupt.
VT-d, in specific Intel chipsets, provides the ability to directly dedicate I/O device ranges to a guest OS, eliminating the need for a hypervisor trap due to an I/O interrupt. VT-d aids system protection by restricting DMA of the devices to pre-assigned domains or physical memory regions. This is achieved in hardware via DMA remapping.
Selection of the virtualization software you run can make a big difference. Each package usually divides up operation into memory, disk I/O, network I/O, and any other I/O and performance of all functions is very application dependent.
One test case I found, which is strictly about a server implementation, tested VMWare ESXi 5 + vSphere 5, Microsoft Hyper-V Windows 2008 R2 SP1, Citrix XenServer 6, and Red Hat Enterprise Virtualization 2.2. It shows as much as a 40 percent networking performance hit for virtualization, and as much as 20 percent for disk I/O, in some virtualization software/OS combinations.
There are a number of real-time virtualization software packages. You might want to look at Wind River, Tenasys (Figure 3), and Real-Time Systems GmbH, for starters.
Figure 3: The Tenasys eVM configuration.
Many industrial systems, including programmable logic controllers and motion controllers, require a combination of low-latency, deterministic response, and full-featured user interfaces. Satisfying both objectives, virtualization can enable systems to simultaneously run real-time and general-purpose operating systems, each on dedicated processor cores of a multicore CPU. This can increase the speed and determinism of time-critical applications, because they operate unencumbered by non-real-time tasks. Furthermore, virtualization may enable the combining of functions running on multiple boards onto one board, which lowers platform cost and size.
- Linux multicore resource allocation and control
- Open Virtualization Format and virtual appliances