My Blogspot: Januari 2017

Archive for Januari 2017

Intel x86 and ARM Operation Types

x86 Operation Types
The x86 provides a complex array of operation types, including a number of specialized instructions. The intent was to provide tools for the compiler writer to produce
optimized machine language translation of high-level language programs. Table 12.8
lists the types and gives examples of each. Most of these are the conventional
instructions found in most machine instruction sets, but several types of instructions
are tailored to the x86 architecture and are of particular interest. Appendix A of

[CART06] lists the x86 instructions, together with the operands for each and the
effect of the instruction on the condition codes. Appendix B of the NASM assembly
language manual provides a more detailed description of each x86 instruction. Both
documents are available at this book’s Web site.
CALL/RETURN INSTRUCTIONS The x86 provides four instructions to support
procedure call/return: CALL, ENTER, LEAVE, RETURN. It will be instructive to
look at the support provided by these instructions. Recall from Figure 12.10 that a
common means of implementing the procedure call/return mechanism is via the use
of stack frames. When a new procedure is called, the following must be performed
upon entry to the new procedure:
• Push the return point on the stack.
• Push the current frame pointer on the stack.
• Copy the stack pointer as the new value of the frame pointer.
• Adjust the stack pointer to allocate a frame.

The CALL instruction pushes the current instruction pointer value onto the stack
and causes a jump to the entry point of the procedure by placing the address of the
entry point in the instruction pointer. In the 8088 and 8086 machines, the typical
procedure began with the sequence

PUSH EBP

MOV EBP, ESP

SUB ESP, space_for_locals

where EBP is the frame pointer and ESP is the stack pointer. In the 80286 and later
machines, the ENTER instruction performs all the aforementioned operations in a
single instruction.
The ENTER instruction was added to the instruction set to provide direct support for the compiler. The instruction also includes a feature for support of what are
called nested procedures in languages such as Pascal, COBOL, and Ada (not found
in C or FORTRAN). It turns out that there are better ways of handling nested
procedure calls for these languages. Furthermore, although the ENTER instruction saves a few bytes of memory compared with the PUSH, MOV, SUB sequence
(4 bytes versus 6 bytes), it actually takes longer to execute (10 clock cycles versus
6 clock cycles). Thus, although it may have seemed a good idea to the instruction
set designers to add this feature, it complicates the implementation of the processor
while providing little or no benefit. We will see that, in contrast, a RISC approach

to processor design would avoid complex instructions such as ENTER and might
produce a more efficient implementation with a sequence of simpler instructions.
MEMORY MANAGEMENT Another set of specialized instructions deals with memory
segmentation. These are privileged instructions that can only be executed from the
operating system. They allow local and global segment tables (called descriptor tables)
to be loaded and read, and for the privilege level of a segment to be checked and altered.
The special instructions for dealing with the on-chip cache were discussed in
Chapter 4.

STATUS FLAGS AND CONDITION CODES Status flags are bits in special registers
that may be set by certain operations and used in conditional branch instructions. The
term condition code refers to the settings of one or more status flags. In the x86 and
many other architectures, status flags are set by arithmetic and compare operations.
The compare operation in most languages subtracts two operands, as does a subtract
operation. The difference is that a compare operation only sets status flags, whereas a
subtract operation also stores the result of the subtraction in the destination operand.
Some architectures also set status flags for data transfer instructions.
Table 12.9 lists the status flags used on the x86. Each flag, or combinations of
these flags, can be tested for a conditional jump. Table 12.10 shows the condition
codes (combinations of status flag values) for which conditional jump opcodes have
been defined.
Several interesting observations can be made about this list. First, we may
wish to test two operands to determine if one number is bigger than another. But
this will depend on whether the numbers are signed or unsigned. For example, the
8-bit number 11111111 is bigger than 00000000 if the two numbers are interpreted
as unsigned integers (255 7 0) but is less if they are considered as 8-bit twos complement numbers (-1 6 0). Many assembly languages therefore introduce two sets
of terms to distinguish the two cases: If we are comparing two numbers as signed
integers, we use the terms less than and greater than; if we are comparing them as
unsigned integers, we use the terms below and above.
A second observation concerns the complexity of comparing signed integers.
A signed result is greater than or equal to zero if (1) the sign bit is zero and there is
no overflow (S = 0 AND O = 0), or (2) the sign bit is one and there is an overflow.

A study of Figure 10.4 should convince you that the conditions tested for the various signed operations are appropriate.
X86 SIMD INSTRUCTIONS In 1996, Intel introduced MMX technology into its
Pentium product line. MMX is set of highly optimized instructions for multimedia tasks.
There are 57 new instructions that treat data in a SIMD (single-instruction, multipledata) fashion, which makes it possible to perform the same operation, such as addition
or multiplication, on multiple data elements at once. Each instruction typically takes a
single clock cycle to execute. For the proper application, these fast parallel operations
can yield a speedup of two to eight times over comparable algorithms that do not use
the MMX instructions [ATKI96]. With the introduction of 64-bit x86 architecture,
Intel has expanded this extension to include double quadword (128 bits) operands and
floating-point operations. In this subsection, we describe the MMX features.
The focus of MMX is multimedia programming. Video and audio data are typically composed of large arrays of small data types, such as 8 or 16 bits, whereas conventional instructions are tailored to operate on 32- or 64-bit data. Here are some
examples: In graphics and video, a single scene consists of an array of pixels,2 and

This calculation is performed on each pixel position in A and B. If a series
of video frames is produced while gradually changing the fade value from 1 to 0
(scaled appropriately for an 8-bit integer), the result is to fade from image A to
image B.

Figure 12.11 shows the sequence of steps required for one set of pixels. The
8-bit pixel components are converted to 16-bit elements to accommodate the
MMX 16-bit multiply capability. If these images use 640 * 480 resolution, and
the dissolve technique uses all 255 possible values of the fade value, then the total
number of instructions executed using MMX is 535 million. The same calculation,
performed without the MMX instructions, requires 1.4 billion instruction executions [INTE98]

ARM Operation Types

The ARM architecture provides a large collection of operation types. The following
are the principal categories:
• Load and store instructions: In the ARM architecture, only load and store
instructions access memory locations; arithmetic and logical instructions are
performed only on registers and immediate values encoded in the instruction.
This limitation is characteristic of RISC design and it is explored further in
Chapter 15. The ARM architecture supports two broad types of instruction
that load or store the value of a single register, or a pair of registers, from or to
memory: (1) load or store a 32-bit word or an 8-bit unsigned byte, and (2) load
or store a 16-bit unsigned halfword, and load and sign extend a 16-bit halfword
or an 8-bit byte.
• Branch instructions: ARM supports a branch instruction that allows a conditional branch forwards or backwards up to 32 MB. As the program counter
is one of the general-purpose registers (R15), a branch or jump can also be
generated by writing a value to R15. A subroutine call can be performed by
a variant of the standard branch instruction. As well as allowing a branch
f orward or backward up to 32 MB, the Branch with Link (BL) instruction
preserves the address of the instruction after the branch (the return address)
in the LR (R14). Branches are determined by a 4-bit condition field in the
instruction.
• Data-processing instructions: This category includes logical instructions
(AND, OR, XOR), add and subtract instructions, and test and compare
instructions.
• Multiply instructions: The integer multiply instructions operate on word or
halfword operands and can produce normal or long results. For example,
there is a multiply instruction that takes two 32-bit operands and produces a
64-bit result.
• Parallel addition and subtraction instructions: In addition to the normal data
processing and multiply instructions, there are a set of parallel addition and
subtraction instructions, in which portions of two operands are operated on
in parallel. For example, ADD16 adds the top halfwords of two registers to
form the top halfword of the result and adds the bottom halfwords of the
same two registers to form the bottom halfword of the result. These instructions are useful in image processing applications, similar to the x86 MMX
instructions.
• Extend instructions: There are several instructions for unpacking data by sign
or zero extending bytes to halfwords or words, and halfwords to words.
• Status register access instructions: ARM provides the ability to read and also
to write portions of the status register.
CONDITION CODES The ARM architecture defines four condition flags that
are stored in the program status register: N, Z, C, and V (Negative, Zero, Carry
and oVerflow), with meanings essentially the same as the S, Z, C, and V flags

in the x86 architecture. These four flags constitute a condition code in ARM.
Table 12.12 shows the combination of conditions for which conditional execution
is defined.
There are two unusual aspects to the use of condition codes in ARM:
1. All instructions, not just branch instructions, include a condition code field,
which means that virtually all instructions may be conditionally executed. Any
combination of flag settings except 1110 or 1111 in an instruction’s condition
code field signifies that the instruction will be executed only if the condition
is met.
2. All data processing instructions (arithmetic, logical) include an S bit that signifies whether the instruction updates the condition flags.
The use of conditional execution and conditional setting of the condition flags
helps in the design of shorter programs that use less memory. On the other hand,
all instructions include 4 bits for the condition code, so there is a trade-off in that
fewer bits in the 32-bit instruction are available for opcode and operands. Because
the ARM is a RISC design that relies heavily on register addressing, this seems to
be a reasonable trade-off.

Types of Operations

Afi Qurrahman 04.15

Afi Qurrahman

Chapter 12 0

The number of different opcodes varies widely from machine to machine. However,
the same general types of operations are found on all machines. A useful and typical
categorization is the following:
• Data transfer
• Arithmetic
• Logical
• Conversion
• I/O
• System control
• Transfer of control
Table 12.3 (based on [HAYE98]) lists common instruction types in each category. This section provides a brief survey of these various types of operations,
together with a brief discussion of the actions taken by the processor to execute a
particular type of operation (summarized in Table 12.4). The latter topic is examined
in more detail in Chapter 14.

Data Transfer

The most fundamental type of machine instruction is the data transfer instruction.
The data transfer instruction must specify several things. First, the location of the
source and destination operands must be specified. Each location could be memory,
a register, or the top of the stack. Second, the length of data to be transferred must
be indicated. Third, as with all instructions with operands, the mode of addressing
for each operand must be specified. This latter point is discussed in Chapter 13.
The choice of data transfer instructions to include in an instruction set exemplifies the kinds of trade-offs the designer must make. For example, the general
location (memory or register) of an operand can be indicated in either the specification of the opcode or the operand. Table 12.5 shows examples of the most common
IBM EAS/390 data transfer instructions. Note that there are variants to indicate.

the amount of data to be transferred (8, 16, 32, or 64 bits). Also, there are different
instructions for register to register, register to memory, memory to register, and
memory to memory transfers. In contrast, the VAX has a move (MOV) instruction
with variants for different amounts of data to be moved, but it specifies whether an
operand is register or memory as part of the operand. The VAX approach is somewhat easier for the programmer, who has fewer mnemonics to deal with. However,
it is also somewhat less compact than the IBM EAS/390 approach because the location (register versus memory) of each operand must be specified separately in the
instruction. We will return to this distinction when we discuss instruction formats in
Chapter 13.
In terms of processor action, data transfer operations are perhaps the simplest
type. If both source and destination are registers, then the processor simply causes
data to be transferred from one register to another; this is an operation internal to
the processor. If one or both operands are in memory, then the processor must perform some or all of the following actions:
1. Calculate the memory address, based on the address mode (discussed in
Chapter 13).
2. If the address refers to virtual memory, translate from virtual to real memory
address.
3. Determine whether the addressed item is in cache.
4. If not, issue a command to the memory module.

Arithmetic

Most machines provide the basic arithmetic operations of add, subtract, multiply, and divide. These are invariably provided for signed integer (fixed-point)
numbers. Often they are also provided for floating-point and packed decimal
numbers.
Other possible operations include a variety of single-operand instructions; for
example,
• Absolute: Take the absolute value of the operand.
• Negate: Negate the operand.
• Increment: Add 1 to the operand.
• Decrement: Subtract 1 from the operand.
The execution of an arithmetic instruction may involve data transfer operations to position operands for input to the ALU, and to deliver the output of the
ALU. Figure 3.5 illustrates the movements involved in both data transfer and arithmetic operations. In addition, of course, the ALU portion of the processor performs
the desired operation.

Logical

Most machines also provide a variety of operations for manipulating individual bits
of a word or other addressable units, often referred to as “bit twiddling.” They are
based upon Boolean operations (see Chapter 11).
Some of the basic logical operations that can be performed on Boolean or
binary data are shown in Table 12.6. The NOT operation inverts a bit. AND, OR,
and Exclusive-OR (XOR) are the most common logical functions with two operands. EQUAL is a useful binary test.
These logical operations can be applied bitwise to n-bit logical data units.
Thus, if two registers contain the data

(R1) = 10100101

(R2) = 00001111

then

(R1) AND (R2) = 00000101

where the notation (X) means the contents of location X. Thus, the AND operation
can be used as a mask that selects certain bits in a word and zeros out the remaining
bits. As another example, if two registers contain

(R1) = 10100101

(R2) = 11111111

then

(R1) XOR (R2) = 01011010

With one word set to all 1s, the XOR operation inverts all of the bits in the other
word (ones complement).
In addition to bitwise logical operations, most machines provide a variety of
shifting and rotating functions. The most basic operations are illustrated in Figure 12.6.
With a logical shift, the bits of a word are shifted left or right. On one end, the bit
shifted out is lost. On the other end, a 0 is shifted in. Logical shifts are useful primarily for isolating fields within a word. The 0s that are shifted into a word displace
unwanted information that is shifted off the other end.

As an example, suppose we wish to transmit characters of data to an I/O
device 1 character at a time. If each memory word is 16 bits in length and contains
two characters, we must unpack the characters before they can be sent. To send the
two characters in a word,
1. Load the word into a register.
2. Shift to the right eight times. This shifts the remaining character to the right
half of the register.
3. Perform I/O. The I/O module reads the lower-order 8 bits from the data bus.

The preceding steps result in sending the left-hand character. To send the righthand character,

1. Load the word again into the register.
2. AND with 0000000011111111. This masks out the character on the left.
3. Perform I/O.

The arithmetic shift operation treats the data as a signed integer and does
not shift the sign bit. On a right arithmetic shift, the sign bit is replicated into the
bit position to its right. On a left arithmetic shift, a logical left shift is performed on
all bits but the sign bit, which is retained. These operations can speed up certain
arithmetic operations. With numbers in twos complement notation, a right arithmetic shift corresponds to a division by 2, with truncation for odd numbers. Both an
arithmetic left shift and a logical left shift correspond to a multiplication by 2 when
there is no overflow. If overflow occurs, arithmetic and logical left shift operations
produce different results, but the arithmetic left shift retains the sign of the number.
Because of the potential for overflow, many processors do not include this instruction, including PowerPC and Itanium. Others, such as the IBM EAS/390, do offer
the instruction. Curiously, the x86 architecture includes an arithmetic left shift but
defines it to be identical to a logical left shift.
Rotate, or cyclic shift, operations preserve all of the bits being operated on.
One use of a rotate is to bring each bit successively into the leftmost bit, where it can
be identified by testing the sign of the data (treated as a number).
As with arithmetic operations, logical operations involve ALU activity and
may involve data transfer operations. Table 12.7 gives examples of all of the shift
and rotate operations discussed in this subsection.

Conversion

Conversion instructions are those that change the format or operate on the format of
data. An example is converting from decimal to binary. An example of a more complex editing instruction is the EAS/390 Translate (TR) instruction. This instruction
can be used to convert from one 8-bit code to another, and it takes three operands:

TR R1 (L), R2

The operand R2 contains the address of the start of a table of 8-bit codes. The
L bytes starting at the address specified in R1 are translated, each byte being
replaced by the contents of a table entry indexed by that byte. For example, to
translate from EBCDIC to IRA, we first create a 256-byte table in storage locations, say, 1000-10FF hexadecimal. The table contains the characters of the IRA
code in the sequence of the binary representation of the EBCDIC code; that is, the
IRA code is placed in the table at the relative location equal to the binary value of
the EBCDIC code of the same character. Thus, locations 10F0 through 10F9 will
contain the values 30 through 39, because F0 is the EBCDIC code for the digit 0,
and 30 is the IRA code for the digit 0, and so on through digit 9. Now suppose we
have the EBCDIC for the digits 1984 starting at location 2100 and we wish to translate to IRA. Assume the following:

• Locations 2100–2103 contain F1 F9 F8 F4.
• R1 contains 2100.
• R2 contains 1000.
Then, if we execute

TR R1 (4), R2

locations 2100–2103 will contain 31 39 38 34.

Input/Output

Input/output instructions were discussed in some detail in Chapter 7. As we saw,
there are a variety of approaches taken, including isolated programmed I/O,
memory-mapped programmed I/O, DMA, and the use of an I/O processor. Many
implementations provide only a few I/O instructions, with the specific actions specified by parameters, codes, or command words.

System Control

System control instructions are those that can be executed only while the processor
is in a certain privileged state or is executing a program in a special privileged area
of memory. Typically, these instructions are reserved for the use of the operating
system.
Some examples of system control operations are as follows. A system control instruction may read or alter a control register; we discuss control registers in
Chapter 14. Another example is an instruction to read or modify a storage protection key, such as is used in the EAS/390 memory system. Another example is access
to process control blocks in a multiprogramming system.

Transfer of Control
For all of the operation types discussed so far, the next instruction to be performed
is the one that immediately follows, in memory, the current instruction. However, a
significant fraction of the instructions in any program have as their function changing the sequence of instruction execution. For these instructions, the operation performed by the processor is to update the program counter to contain the address of
some instruction in memory.
There are a number of reasons why transfer-of-control operations are
required. Among the most important are the following:
1. In the practical use of computers, it is essential to be able to execute each
instruction more than once and perhaps many thousands of times. It may
require thousands or perhaps millions of instructions to implement an application. This would be unthinkable if each instruction had to be written out separately. If a table or a list of items is to be processed, a program loop is needed.
One sequence of instructions is executed repeatedly to process all the data.
2. Virtually all programs involve some decision making. We would like the computer to do one thing if one condition holds, and another thing if another condition
holds. For example, a sequence of instructions computes the square root of a number. At the start of the sequence, the sign of the number is tested. If the number
is negative, the computation is not performed, but an error condition is reported.
3. To compose correctly a large or even medium-size computer program is an
exceedingly difficult task. It helps if there are mechanisms for breaking the
task up into smaller pieces that can be worked on one at a time.

We now turn to a discussion of the most common transfer-of-control operations found in instruction sets: branch, skip, and procedure call.
BRANCH INSTRUCTIONS A branch instruction, also called a jump instruction,
has as one of its operands the address of the next instruction to be executed. Most
often, the instruction is a conditional branch instruction. That is, the branch is made
(update program counter to equal address specified in operand) only if a certain
condition is met. Otherwise, the next instruction in sequence is executed (increment
program counter as usual). A branch instruction in which the branch is always taken
is an unconditional branch.
There are two common ways of generating the condition to be tested in a conditional branch instruction. First, most machines provide a 1-bit or multiple-bit condition code that is set as the result of some operations. This code can be thought
of as a short user-visible register. As an example, an arithmetic operation (ADD,
SUBTRACT, and so on) could set a 2-bit condition code with one of the following
four values: 0, positive, negative, overflow. On such a machine, there could be four
different conditional branch instructions:

BRP X Branch to location X if result is positive.

BRN X Branch to location X if result is negative.

BRZ X Branch to location X if result is zero.

BRO X Branch to location X if overflow occurs.

In all of these cases, the result referred to is the result of the most recent
operation that set the condition code.
Another approach that can be used with a three-address instruction format is
to perform a comparison and specify a branch in the same instruction. For example,

BRE R1, R2, X Branch to X if contents of R1 = contents of R2.

Figure 12.7 shows examples of these operations. Note that a branch can be
either forward (an instruction with a higher address) or backward (lower address).
The example shows how an unconditional and a conditional branch can be used to
create a repeating loop of instructions. The instructions in locations 202 through 210
will be executed repeatedly until the result of subtracting Y from X is 0.
SKIP INSTRUCTIONS Another form of transfer-of-control instruction is the skip
instruction. The skip instruction includes an implied address. Typically, the skip
implies that one instruction be skipped; thus, the implied address equals the address
of the next instruction plus one instruction length.
Because the skip instruction does not require a destination address field, it is
free to do other things. A typical example is the increment-and-skip-if-zero (ISZ)
instruction. Consider the following program fragment:

301

~ ~ ~

309 ISZ R1

310 BR 301

311

In this fragment, the two transfer-of-control instructions are used to implement
an iterative loop. R1 is set with the negative of the number of iterations to be
performed. At the end of the loop, R1 is incremented. If it is not 0, the program
branches back to the beginning of the loop. Otherwise, the branch is skipped, and
the program continues with the next instruction after the end of the loop

PROCEDURE CALL INSTRUCTIONS Perhaps the most important innovation in the
development of programming languages is the procedure. A procedure is a selfcontained computer program that is incorporated into a larger program. At any
point in the program the procedure may be invoked, or called. The processor is
instructed to go and execute the entire procedure and then return to the point from
which the call took place.
The two principal reasons for the use of procedures are economy and modularity. A procedure allows the same piece of code to be used many times. This is
important for economy in programming effort and for making the most efficient use
of storage space in the system (the program must be stored). Procedures also allow
large programming tasks to be subdivided into smaller units. This use of modularity
greatly eases the programming task.
The procedure mechanism involves two basic instructions: a call instruction
that branches from the present location to the procedure, and a return instruction
that returns from the procedure to the place from which it was called. Both of these
are forms of branching instructions.
Figure 12.8a illustrates the use of procedures to construct a program. In this
example, there is a main program starting at location 4000. This program includes
a call to procedure PROC1, starting at location 4500. When this call instruction is
encountered, the processor suspends execution of the main program and begins execution of PROC1 by fetching the next instruction from location 4500. Within PROC1,
there are two calls to PROC2 at location 4800. In each case, the execution of PROC1

is suspended and PROC2 is executed. The RETURN statement causes the processor to go back to the calling program and continue execution at the instruction after
the corresponding CALL instruction. This behavior is illustrated in Figure 12.8b.
Three points are worth noting:
1. A procedure can be called from more than one location.
2. A procedure call can appear in a procedure. This allows the nesting of procedures to an arbitrary depth.
3. Each procedure call is matched by a return in the called program.
Because we would like to be able to call a procedure from a variety of points,
the processor must somehow save the return address so that the return can take
place appropriately. There are three common places for storing the return address:
• Register
• Start of called procedure
• Top of stack
Consider a machine-language instruction CALL X, which stands for call procedure at location X. If the register approach is used, CALL X causes the following
actions:

RN v PC +

PC v X

where RN is a register that is always used for this purpose, PC is the program counter, and is the instruction length. The called procedure can now save the contents
of RN to be used for the later return.
A second possibility is to store the return address at the start of the procedure.
In this case, CALL X causes

X v PC +

PC v X + 1

This is quite handy. The return address has been stored safely away.
Both of the preceding approaches work and have been used. The only limitation of these approaches is that they complicate the use of reentrant procedures.
A reentrant procedure is one in which it is possible to have several calls open to it at
the same time. A recursive procedure (one that calls itself) is an example of the use
of this feature (see Appendix H). If parameters are passed via registers or memory
for a reentrant procedure, some code must be responsible for saving the parameters
so that the registers or memory space are available for other procedure calls.
A more general and powerful approach is to use a stack (see Appendix O
for a discussion of stacks). When the processor executes a call, it places the return
address on the stack. When it executes a return, it uses the address on the stack.
Figure 12.9 illustrates the use of the stack.
In addition to providing a return address, it is also often necessary to pass
parameters with a procedure call. These can be passed in registers. Another possibility is to store the parameters in memory just after the CALL instruction. In this
case, the return must be to the location following the parameters. Again, both of

these approaches have drawbacks. If registers are used, the called program and the
calling program must be written to assure that the registers are used properly. The
storing of parameters in memory makes it difficult to exchange a variable number of
parameters. Both approaches prevent the use of reentrant procedures.
A more flexible approach to parameter passing is the stack. When the processor executes a call, it not only stacks the return address, it stacks parameters to
be passed to the called procedure. The called procedure can access the parameters
from the stack. Upon return, return parameters can also be placed on the stack. The
entire set of parameters, including return address, that is stored for a procedure
invocation is referred to as a stack frame.
An example is provided in Figure 12.10. The example refers to procedure P
in which the local variables x1 and x2 are declared, and procedure Q, which P can
call and in which the local variables y1 and y2 are declared. In this figure, the return

point for each procedure is the first item stored in the corresponding stack frame.
Next is stored a pointer to the beginning of the previous frame. This is needed if the
number or length of parameters to be stacked is variable.

My Blogspot

Recommended Reading

Intel x86 and ARM Operation Types

Types of Operations