The x86 provides a complex array of operation types, including a number of specialized instructions. The intent was to provide tools for the compiler writer to produce
optimized machine language translation of high-level language programs. Table 12.8
lists the types and gives examples of each. Most of these are the conventional
instructions found in most machine instruction sets, but several types of instructions
are tailored to the x86 architecture and are of particular interest. Appendix A of
[CART06] lists the x86 instructions, together with the operands for each and the
effect of the instruction on the condition codes. Appendix B of the NASM assembly
language manual provides a more detailed description of each x86 instruction. Both
documents are available at this book’s Web site.
CALL/RETURN INSTRUCTIONS The x86 provides four instructions to support
procedure call/return: CALL, ENTER, LEAVE, RETURN. It will be instructive to
look at the support provided by these instructions. Recall from Figure 12.10 that a
common means of implementing the procedure call/return mechanism is via the use
of stack frames. When a new procedure is called, the following must be performed
upon entry to the new procedure:
• Push the return point on the stack.
• Push the current frame pointer on the stack.
• Copy the stack pointer as the new value of the frame pointer.
• Adjust the stack pointer to allocate a frame.
effect of the instruction on the condition codes. Appendix B of the NASM assembly
language manual provides a more detailed description of each x86 instruction. Both
documents are available at this book’s Web site.
CALL/RETURN INSTRUCTIONS The x86 provides four instructions to support
procedure call/return: CALL, ENTER, LEAVE, RETURN. It will be instructive to
look at the support provided by these instructions. Recall from Figure 12.10 that a
common means of implementing the procedure call/return mechanism is via the use
of stack frames. When a new procedure is called, the following must be performed
upon entry to the new procedure:
• Push the return point on the stack.
• Push the current frame pointer on the stack.
• Copy the stack pointer as the new value of the frame pointer.
• Adjust the stack pointer to allocate a frame.
The CALL instruction pushes the current instruction pointer value onto the stack
and causes a jump to the entry point of the procedure by placing the address of the
entry point in the instruction pointer. In the 8088 and 8086 machines, the typical
procedure began with the sequence
and causes a jump to the entry point of the procedure by placing the address of the
entry point in the instruction pointer. In the 8088 and 8086 machines, the typical
procedure began with the sequence
PUSH EBP
MOV EBP, ESP
SUB ESP, space_for_locals
machines, the ENTER instruction performs all the aforementioned operations in a
single instruction.
The ENTER instruction was added to the instruction set to provide direct support for the compiler. The instruction also includes a feature for support of what are
called nested procedures in languages such as Pascal, COBOL, and Ada (not found
in C or FORTRAN). It turns out that there are better ways of handling nested
procedure calls for these languages. Furthermore, although the ENTER instruction saves a few bytes of memory compared with the PUSH, MOV, SUB sequence
(4 bytes versus 6 bytes), it actually takes longer to execute (10 clock cycles versus
6 clock cycles). Thus, although it may have seemed a good idea to the instruction
set designers to add this feature, it complicates the implementation of the processor
while providing little or no benefit. We will see that, in contrast, a RISC approach
to processor design would avoid complex instructions such as ENTER and might
produce a more efficient implementation with a sequence of simpler instructions.
MEMORY MANAGEMENT Another set of specialized instructions deals with memory
segmentation. These are privileged instructions that can only be executed from the
operating system. They allow local and global segment tables (called descriptor tables)
to be loaded and read, and for the privilege level of a segment to be checked and altered.
The special instructions for dealing with the on-chip cache were discussed in
Chapter 4.
STATUS FLAGS AND CONDITION CODES Status flags are bits in special registers
that may be set by certain operations and used in conditional branch instructions. The
term condition code refers to the settings of one or more status flags. In the x86 and
many other architectures, status flags are set by arithmetic and compare operations.
The compare operation in most languages subtracts two operands, as does a subtract
operation. The difference is that a compare operation only sets status flags, whereas a
subtract operation also stores the result of the subtraction in the destination operand.
Some architectures also set status flags for data transfer instructions.
Table 12.9 lists the status flags used on the x86. Each flag, or combinations of
these flags, can be tested for a conditional jump. Table 12.10 shows the condition
codes (combinations of status flag values) for which conditional jump opcodes have
been defined.
Several interesting observations can be made about this list. First, we may
wish to test two operands to determine if one number is bigger than another. But
this will depend on whether the numbers are signed or unsigned. For example, the
8-bit number 11111111 is bigger than 00000000 if the two numbers are interpreted
as unsigned integers (255 7 0) but is less if they are considered as 8-bit twos complement numbers (-1 6 0). Many assembly languages therefore introduce two sets
of terms to distinguish the two cases: If we are comparing two numbers as signed
integers, we use the terms less than and greater than; if we are comparing them as
unsigned integers, we use the terms below and above.
A second observation concerns the complexity of comparing signed integers.
A signed result is greater than or equal to zero if (1) the sign bit is zero and there is
no overflow (S = 0 AND O = 0), or (2) the sign bit is one and there is an overflow.
X86 SIMD INSTRUCTIONS In 1996, Intel introduced MMX technology into its
Pentium product line. MMX is set of highly optimized instructions for multimedia tasks.
There are 57 new instructions that treat data in a SIMD (single-instruction, multipledata) fashion, which makes it possible to perform the same operation, such as addition
or multiplication, on multiple data elements at once. Each instruction typically takes a
single clock cycle to execute. For the proper application, these fast parallel operations
can yield a speedup of two to eight times over comparable algorithms that do not use
the MMX instructions [ATKI96]. With the introduction of 64-bit x86 architecture,
Intel has expanded this extension to include double quadword (128 bits) operands and
floating-point operations. In this subsection, we describe the MMX features.
The focus of MMX is multimedia programming. Video and audio data are typically composed of large arrays of small data types, such as 8 or 16 bits, whereas conventional instructions are tailored to operate on 32- or 64-bit data. Here are some
examples: In graphics and video, a single scene consists of an array of pixels,2 and
This calculation is performed on each pixel position in A and B. If a series
of video frames is produced while gradually changing the fade value from 1 to 0
(scaled appropriately for an 8-bit integer), the result is to fade from image A to
image B.
of video frames is produced while gradually changing the fade value from 1 to 0
(scaled appropriately for an 8-bit integer), the result is to fade from image A to
image B.
Figure 12.11 shows the sequence of steps required for one set of pixels. The
8-bit pixel components are converted to 16-bit elements to accommodate the
MMX 16-bit multiply capability. If these images use 640 * 480 resolution, and
the dissolve technique uses all 255 possible values of the fade value, then the total
number of instructions executed using MMX is 535 million. The same calculation,
performed without the MMX instructions, requires 1.4 billion instruction executions [INTE98]
8-bit pixel components are converted to 16-bit elements to accommodate the
MMX 16-bit multiply capability. If these images use 640 * 480 resolution, and
the dissolve technique uses all 255 possible values of the fade value, then the total
number of instructions executed using MMX is 535 million. The same calculation,
performed without the MMX instructions, requires 1.4 billion instruction executions [INTE98]
The ARM architecture provides a large collection of operation types. The following
are the principal categories:
• Load and store instructions: In the ARM architecture, only load and store
instructions access memory locations; arithmetic and logical instructions are
performed only on registers and immediate values encoded in the instruction.
This limitation is characteristic of RISC design and it is explored further in
Chapter 15. The ARM architecture supports two broad types of instruction
that load or store the value of a single register, or a pair of registers, from or to
memory: (1) load or store a 32-bit word or an 8-bit unsigned byte, and (2) load
or store a 16-bit unsigned halfword, and load and sign extend a 16-bit halfword
or an 8-bit byte.
• Branch instructions: ARM supports a branch instruction that allows a conditional branch forwards or backwards up to 32 MB. As the program counter
is one of the general-purpose registers (R15), a branch or jump can also be
generated by writing a value to R15. A subroutine call can be performed by
a variant of the standard branch instruction. As well as allowing a branch
f orward or backward up to 32 MB, the Branch with Link (BL) instruction
preserves the address of the instruction after the branch (the return address)
in the LR (R14). Branches are determined by a 4-bit condition field in the
instruction.
• Data-processing instructions: This category includes logical instructions
(AND, OR, XOR), add and subtract instructions, and test and compare
instructions.
• Multiply instructions: The integer multiply instructions operate on word or
halfword operands and can produce normal or long results. For example,
there is a multiply instruction that takes two 32-bit operands and produces a
64-bit result.
• Parallel addition and subtraction instructions: In addition to the normal data
processing and multiply instructions, there are a set of parallel addition and
subtraction instructions, in which portions of two operands are operated on
in parallel. For example, ADD16 adds the top halfwords of two registers to
form the top halfword of the result and adds the bottom halfwords of the
same two registers to form the bottom halfword of the result. These instructions are useful in image processing applications, similar to the x86 MMX
instructions.
• Extend instructions: There are several instructions for unpacking data by sign
or zero extending bytes to halfwords or words, and halfwords to words.
• Status register access instructions: ARM provides the ability to read and also
to write portions of the status register.
CONDITION CODES The ARM architecture defines four condition flags that
are stored in the program status register: N, Z, C, and V (Negative, Zero, Carry
and oVerflow), with meanings essentially the same as the S, Z, C, and V flags
in the x86 architecture. These four flags constitute a condition code in ARM.
Table 12.12 shows the combination of conditions for which conditional execution
is defined.
There are two unusual aspects to the use of condition codes in ARM:
1. All instructions, not just branch instructions, include a condition code field,
which means that virtually all instructions may be conditionally executed. Any
combination of flag settings except 1110 or 1111 in an instruction’s condition
code field signifies that the instruction will be executed only if the condition
is met.
2. All data processing instructions (arithmetic, logical) include an S bit that signifies whether the instruction updates the condition flags.
The use of conditional execution and conditional setting of the condition flags
helps in the design of shorter programs that use less memory. On the other hand,
all instructions include 4 bits for the condition code, so there is a trade-off in that
fewer bits in the 32-bit instruction are available for opcode and operands. Because
the ARM is a RISC design that relies heavily on register addressing, this seems to
be a reasonable trade-off.
0 Komentar