22 C FOR THE MICROPROCESSOR ENGINEER
Figure 2.2 Moving 16-bit data at òne go'.
system operations of jumping to a subroutine and implementing an interrupt,
decrement the relevant Stack Pointer before moving data. As mentioned earlier,
the Push and Pull operations allow any register or set of registers to be pushed or
pulled into or out of a stack at one go. This facilitates the passing of arguments
to and from subroutines, and allows called subroutines to use registers without
corrupting register-held data in the calling program (see Section 5.2).
Figure 2.1 shows how the post-byte is calculated for a Push or a Pull. Specif-
ically the System stack is shown; if the User stack is being employed then U is
replaced by S. Figure 2.3 shows a snapshot of memory after a Push onto the Sys-
tem stack. If only a subset of registers are saved, then the same order is preserved
as in the diagram. The time-taken for a Push or Pull is five cycles plus one cycle
per byte moved. In Fig. 2.3 this adds up to 17 cycles.
The 6809 implements the normal Add and Subtract operations, as shown in
Table 2.2, both with and without carry, targeted on an 8-bit Accumulator. An
Accumulator_D-based 16-bit Add and Subtract instruction is also provided, but
unfortunately not with a carry. An unsigned addition of Accumulator_B to the
16-bit X Index register can also be classed as double, but the 8-bit addend is
promoted to 16-bit at addition time, by assuming an upper byte of zero, hence the
terminology unsigned. Thus for example, ABX #56h actually adds the constant
0056h to X.
It is possible to promote a signed number in Accumulator_B to its 16-bit equiv-
alent in Accumulator_D by using the Sign EXtension instruction. This zeros
Accumulator_A if bit 7 of B is 0 and fills A with ones (A ITS INSTRUCTION SET 23
Figure 2.3 Stacking registers in memory using PSH and PUL. Also applicable to IRQ and NMI interrupts.
24 C FOR THE MICROPROCESSOR ENGINEER
Table 2.2 Arithmetic operations
Flags
Operation Mnemonic V N Z C Description
Add √ √ √ √ Binary addition
to A; to B ADDA; ADDB √ √ √ √ [A] ITS INSTRUCTION SET 25
calculates the effective address as [X] + 1 and loads it into the X Index register
([X] 26 C FOR THE MICROPROCESSOR ENGINEER
Table 2.3 Shifting Instructions.
Flags
Operation Mnemonic V N Z C Description
Shift left, arithmetic or logic Linear shift left into carry
1 √√
memory ASL b7
√√ C ← ← 0
A; B ASLA; ASLB 1 b7
Shift right, logic √√ Linear shift right into carry
memory LSR • b0
√√ 0 → → C
A; B LSRA; LSRB • b0
Shift right, arithmetic √√ As above but keeps sign bit
memory ASR • b0
√√ b7 → → C
A; B ASRA; ASRB • b0
Rotate left Circular shift left into carry
1 √√
memory ROL b7
√√ C ← ← C
A; B ROLA; ROLB 1 b7
Rotate right √√ Circular shift right into carry
memory ROR • b0
√√ C → → C
A; B RORA; RORB • b0
Note 1: V=b7⊕b6 before shift.
Circular or Rotate Shift instructions are similar to Add with Carry, in that they
can be used for multiple-precision operations. A Rotate takes in the Carry from
any previous Shift and in turn saves its ejected bit in the C flag. As an example,
a 24-bit word stored in 24 M 16 15 M+1 8 7 M+2 0 can be shifted
right once by the sequence [4]:
M
LSR M ; 0 → ⇒ b16 → C
M+1
ROR M+1 ; b16/ C → ⇒ b8 → C
M+2
ROR M+2 ; b8 / C → ⇒ b0 → C
In all types of Left Shifts, the oVerflow flag is set when bits 7 and 6 differ
before the shift (i.e. b7⊕b6), meaning that the (apparent) sign will change after
the shift.
The logic operations of AND, OR, Exclusive-OR and NOT (Complement) are
provided, as shown in Table 2.4. The only unusual feature here is the special
instructions of ANDCC and ORCC for clearing or setting flags in the Code Condition
register. Thus to clear the I mask (see Fig. 1.1) we have:
ITS INSTRUCTION SET 27
ANDCC #11101111b ; Coded as 1C-EFh (equivalent to CLI)
and to set it:
ORCC #00010000b ; Coded as 1A-10h (eqivalent to SEI)
This saves having to provide a series of separate instructions targeted at each
of the CCR flags and masks, such as the 6800's CLI and SEI (CLear and SEt
Interrupt mask), and also allows more than one flag to be set or cleared in a
single instruction.
Table 2.4 Logic instructions.
Flags
Operation Mnemonic V N Z C Description
AND √ √ Logic bitwise AND
A; B ASL 0 • [A]28 C FOR THE MICROPROCESSOR ENGINEER
Table 2.5 Data test operations.
Flags
Operation Mnemonic V N Z C Description
Bit Test √ √ Non-destructive AND
A; B BITA; BITB 0 • [A]·[M]; [B]·[M]
Compare √ √ √ √ Non-destructive subtract
with A; B CMPA; CMPB √ √ √ √ [A]−[M]; [B]−[M]
with D CMPD √ √ √ √ [D]−[M:M+1]
with X; Y CMPX; CMPY √ √ √ √ [X]−[M:M+1]; [Y]−[M:M+1]
with S; U CMPS; CMPU [S]−[M:M+1]; [U]−[M:M+1]
Test for Zero or Minus √ √ Non-destructive subtract from zero
memory TST 0 √ √ • [M]−00
A; B TSTA; TSTB 0 • [A]−00; [B]−00
ANDB #00100000b ; Clear all Accumulator B bits except 5 {C4-20h}
will set the Z flag if bit 5 is 0, otherwise Z will be cleared. Once again this is a
destructive examination, and the equivalent from Table 2.5 is BIT test; thus:
BITB #00100000b ; Coded as C5-20h
does the same thing, but with the contents of Accumulator_B remaining un-
changed; and more tests can subsequently be carried out without reloading.
Comparison of the magnitude of data in an Accumulator with either a constant
or data in memory requires a different approach. Mathematically this can be
done by subtracting [M] from [A] and checking the state of the flags. Which
flags are relevant depend on whether the numbers are to be treated as unsigned
(magnitude only) or signed. Taking the former first gives:
[A] Higher than [M] : [A]−[M] gives no Carry and non-Zero C=0, Z=0 (C + Z=1)
[A] Equal to [M] : [A]−[M] gives Zero (Z=1)
[A] Lower than [M] : [A]−[M] gives a Carry (C=1)
The signed situation is more complex, involving both the Negative and oVer-
flow flag. Where a subtraction occurs and the difference is positive, then either
bit 7 will be 0 and there will be no overflow (both N and V are 0) or else an overflow
will occur with bit 7 at logic 1 (both N and V are 1). Logically, this is detected by
the function N⊕V. A negative difference is signalled whenever there is no over-
flow and the sign bit is 1 (N is 1 and V is 0) or else an overflow occurs together
with a positive sign bit (N is 0 and V is 1). Logically, this is N⊕V. Based on these
outcomes we have:
[A] Greater than [M] : [A]−[M] → non-zero +ve result (N⊕V·Z = 1 or N⊕V+Z = 0)
[A] Equal to [M] : [A]−[M] → zero (Z=1)
[A] Less than [M] : [A]−[M] → a negative result (N⊕V = 1)
Subtraction is a destructive test operation and Comparison is its non-destructive
counterpart. It is the most powerful of the Data Testing operations, as it can be
ITS INSTRUCTION SET 29
applied to both Index and Stack Pointer registers as well as 8- and 16-bit Accu-
mulators.
Table 2.6 Operations which affect the Program Counter.
Operation Mnemonic Description
Bcc cc is the logical condition tested
LBcc
Always (True) BRA; LBRA Always affirmed regardless of flags
Never (False) BRN; LBRN Never carried out
Equal BEQ; LBEQ Z flag set (Zero result)
not Equal BNE; LBNE Z flag clear (Non-zero result)
Carry Set BCS; LBCS1 [Acc] Lower Than (Carry = 1)
Carry Clear BCC; LBCC2 [Acc] Higher or Same as (Carry = 0)
Lower or Same BLS; LBLS [Acc] Lower or Same as (C+Z=1)
Higher Than BHI; LBHI [Acc] Higher Than (C+Z=0)
Minus BMI; LBMI N flag set (Bit 7 = 1)
Plus BPL; LBPL N flag clear (Bit 7 = 0)
Overflow Set BVS; LBVS V flag set
Overflow Clear BVC; LBVC V flag clear
Greater Than BGT; LBGT [Acc] Greater Than (N ⊕ V · Z = 1)
Less Than or Equal BLE; LBLE [Acc] Less Than or Equal (N ⊕ V · Z = 0)
Greater Than or Equal BGE; LBGE [Acc] Greater Than or Equal (N ⊕ V = 1)
Less Than BLT; LBLT [Acc] Less Than (N ⊕ V = 0)
Jump JMP Absolute unconditional goto
No Operation NOP Only increments Program Counter
2's complement Branch
Note 1: Some assemblers allow the alternative BLO.
Note 2: Some assemblers allow the alternative BHS.
All Conditional operations in the 6809 are in the form of a Branch instruction.
These cause the Program Counter to skip xx places forward or backwards; usu-
ally based on the state of the CCR flags. Excluding Branch to SubRoutine (see
Section 5.1), there are 16 Branches provided, which can be considered as the True
or False outcome of eight flag combinations. Thus Branch if Carry Set (BCS)
and Branch if Carry Clear (BCC) are based on the one test (C =?).
If the test is True, the offset following the Branch op-code is added to the
Program Counter. Thus if the Carry flag is zero:
E100:1 BCC-08 ; Coded as 24-08h
30 C FOR THE MICROPROCESSOR ENGINEER
will add 0008h to the Program Counter state E102h to give PC = E10Ah. Note
that the PC is already pointing to the following instruction when execution occurs,
giving an effective destination of ten places on from the Branch location. The
Branch offset is sign extended before addition to the Program Counter; thus if
the N flag is zero:
E100:1 BPL-F8 ; Coded as 24-F8h
gives PC ADDRESS MODES 31
Table 2.7: (a) The M6809 instruction set (continued next page).
Insert page 1 of Table 2.7 here.
32 C FOR THE MICROPROCESSOR ENGINEER
Table 2.7: (b) The M6809 instruction set (continued next page).
Insert page 2 of Table 2.7 here.
ADDRESS MODES 33
Table 2.7 (c) (continued). The M6809 instruction set. Reproduced by courtesy of Motorola Semicon-
ductor Products Ltd.
Insert page 3 of Table 2.7 here.
34 C FOR THE MICROPROCESSOR ENGINEER
inform the MPU's Control registers where this data is being held. There are a
few exceptions to this, the so called Inherent operations, such as NOP (No OP-
eration) and RTS (ReTurn from Subroutine). Single-byte instructions whose
operand is a single register, for example INCA (INCrement accumulator A), are
also sometimes classified as Inherent.
With the exception of Inherent instructions, the bytes following the op-code
are either the (constant) operand itself, or more usually a pointer to where the
operand can be found. We have already met the simplest of these, where the
absolute address itself follows, as in:
LDA 2000h ; [A] ADDRESS MODES 35
program would take 3072 cycles, whilst the loop equivalent takes considerably
longer at 4867 cycles to execute.
In the remainder of this section, we will look at the 6809 address modes. In
this catalog, op-code may be one or two bytes.
Inherent
op-code
All the operand information is contained in the op-code, with no specific address-
related bytes following. All of the 6809 inherent operations are one byte long
except SoftWare Interrupt 2. An example is NOP (No OPeration). Motorola
also classify most Register-Direct instructions as inherent, for example INCA (IN-
Crement A). Table 2.7 gives the Inherent instructions.
Register Direct, R
op-code post-byte
Information concerning the source register(s) and/or destination register(s) are
contained in a post-byte. For example TFR A,B (TransFeR the contents of A
to B) is coded as 0001 1111 1000 1001b (1F-89h). The post-byte here is divided
into two fields. The left field specifies the source register, and the right the
destination. Each register is encoded as a bit in a 4-wide code. Thus 1000b is A
and 1001b is B. A list of codes is given on page 20. The Transfer, Exchange, Push,
and Pull operations come under this category. In Table 2.7 these are classified as
Immediate.
Immediate, #kk
op-code constant 8 bit
op-code constant 16 bit
With Immediate addressing, the byte or bytes following the op-code are constant
data and not a pointer to data. We have used this form of addressing before, in
the array argument routine in Table 2.8. Some examples are:
ADDB #30h ; Add the constant 30h to Acc. B {Coded as CB-30h}
LDX #2000h ; Put the constant 2000h in X {Coded as 8E-20-00h}
CMPY #21FFh ; Compare [Y] with the constant 21FFh {Coded as 10-8C-21-FFh}
The pound (hash) symbol # is commonly used to indicate a constant number.
Absolute, M
36 C FOR THE MICROPROCESSOR ENGINEER
op-code DP offset Short (Direct)
op-code Address Long (Extended Direct)
In Absolute addressing, the address itself — either in whole or part — follows
the op-code. Motorola terms the long 16-bit address version as Extended Direct.
There is a short version just called Direct, where the effective address (ea) is the
concatenation of the Direct Page register with the byte following the op-code.
Thus if this register is set at, say, 80h, then the instruction LDA 08h, coded as
96-08h, effectively brings down the byte from address 8008h. Some assem-
blers have difficulty in deciding which of these forms to use. For example, in the
fragment above, should the assembler generate the code B6-80-08 (LDA 8008) or
96-08 (LDA 08)? After all, the setting of the DP register may have been altered in
a call to a subroutine yet to be linked in. There are ways around this, but none is
entirely satisfactory.
Absolute Indirect, [M]
op-code | 9Fh Pointer to address
Here the op-code is followed by a post-byte 9Fh and then a 16-bit address. This
is not the address of the operand but a pointer to where the operand address is
stored in memory. Thus, if the locations 2000:2001h hold the address 80-08h,
then the instruction:
LDA [2000h] ; [A] ADDRESS MODES 37
Branch Relative
op-code offset 8-bit (Short)
op-code offset 16-bit (Long)
We have already discussed this form of address mode in the previous section.
Regular (or short) Branches sign extend the following 8-bit offset, and add this to
the Program Counter. Effectively this means that offsets between 80h and FFh
are treated as negative. For example the instruction BRA -06 is coded as 20-FAh
(FAh is the 2's complement of 06h) when the PC is at E108h, is implemented as:
1110 0001 0000 1000 (PC) = E108h
+ 1111 1111 1111 1010 (offset) = FF FAh = −6
1 1110 0001 0000 0010 (E102h, which is E108h − 0006h)
In calculating this offset, it must be remembered that the PC is already point-
ing to the next instruction. Thus the maximum forward point is (00)7Fh + 2 =
127 + 2 = 129 bytes from the op-code and (FF)80h + 2 = −128 + 2 = 126 bytes
back. Long Branches have a 16-bit offset and can range from +32,767 and −32,768
bytes from the following op-code, effectively anywhere in the full 64 kbyte ad-
dress space of memory that the processor can address at one time. Of course
Long Branch code is bigger and slower to execute (see Table 2.7(c) under the
column ~).
Indexed
The Absolute address modes are used where operands lie in fixed locations. In
many cases, this places an unacceptable restriction on the data structures which
can easily be processed. Compilers, for example, like to pass parameters in a
stack, and these should then be capable of being retrieved in locations relative
to the Stack Pointer. The 6800 MPU has a primitive form of computed effective
address (ea), where this could be up to +FFh (+255) bytes from the contents of
one Index register thus:
LDAA 8,X ; [A] 38 C FOR THE MICROPROCESSOR ENGINEER
op-code post-byte±n ± n, R (5-bit)
op-code post-byte ±n ±n, R (8-bit)
op-code post-byte ±n ±n, R (16-bit)
Here the effective address is R ± n where R is X, Y, S or U. The actual machine
code produced depends on the size of n, with a single post-byte capable of in-
tegrally handling up to ±15. This complex encoding scheme is worthwhile, as
most offsets are small; for example, an analysis has shown that 40% of this type
of indexing uses a zero offset [1]. Indirect Constant Offset Index does not have
an 8-bit (±127) offset version, the 16-bit variety being used. Fortunately the task
of evaluating the post-byte and following bytes is handled automatically by the
assembler.
Post-Auto-Increment / Pre-Auto-Decrement from Register
op-code post-byte ,R+ / ,R++ / ,-R / ,--R
As we saw in the listing of Table 2.8(b), indexing comes into its own when stepping
through blocks of memory, arrays and related structures. To avoid having to
follow (or lead) the use of the Index register with an Increment or Decrement,
this mode provides for automatic advance or retard; thus:
LDA ,R+ ; Bring down data byte and then increment Index register R
LDA ,-R ; Bring down data byte and then increment Index register R twice
LDA ,R++ ; Decrement Index register R and then bring down data byte
LDA ,--R ; Decrement Index register R twice and then bring down data byte
where R is X, Y, S or U. Notice that incrementing is done after and decrementing
before the Index register is used. Double Increment/Decrement modes are useful
when the arrays contain addresses or other double-byte data. Indirection is only
available for this double form, as by its nature addresses are likely to be being
accessed.
As an example of these modes, consider the problem of multiplying two 256-
byte arrays to give a 256 double-byte array. If array_1 begins at 2000h with
the second array following directly, and the product array commences at 3000h,
then we have:
LDX #2000h ; Point IX to array_1[0]
LDY #3000h ; Point IY to array_3[0]
LOOP: LDA 256,X ; Get array_2[i]
LDB ,X+ ; Get array_1[i]; increment i
MUL ; Multiply them
STD ,Y++ ; Put it away and move on twice
CMPX #21FFh ; Last element yet?
BLS LOOP ; IF not past it THEN repeat
RTS ; ELSE finished
ADDRESS MODES 39
Accumulator Offset from Register, A,R / B,R / D,R
op-code post-byte
As an alternative to a constant offset, any Accumulator can hold a variable offset
to an Index register, for example:
LDA B,X ;[A] 40 C FOR THE MICROPROCESSOR ENGINEER
One of the major advantages of the Relative address mode is that it produces
position independent code (PIC). Thus a Branch is relative to where the program
is at the time the decision is taken. If the program is moved to a different part
of memory, all the offsets move with it unchanged. This is what differentiates
a Branch from a Jump operation. The Program Counter Offset mode extends
the PIC capability to any instruction which has an Indexed address mode. This is
similar to the Constant Offset from Register mode, but with the Program Counter
being the Index register. For example in:
LDA 200h,PC ;[A] EXAMPLE PROGRAMS 41
We first met the Load Effective Address (LEA) instruction in Table 2.2. Here
we observed that it could be used to perform simple arithmetic on the X, Y, U or S
registers. Essentially, any effective address computed by any of the Direct Index
address modes, except Post-Increment/Pre-Decrement, can be loaded into one of
these four registers. A few examples are:
LEAX +2,X ; The EA of X+2 is put into X, effectively incrementing X by 2
LEAY D,X ; Adds [D] to [X] and puts sum in Y
LEAS -20,S ; Moves the Stack Pointer down 20 bytes
2.3 Example Programs
Previously we have used program fragments to illustrate various instruction/address
mode combinations. Here we conclude our look at 6809 assembly-level software
by developing three programs of a slightly more elaborate nature. This will serve
to integrate at least some of the concepts we have discussed, and provide for a
comparison with equivalent software using 68000 code in Chapter 4. Each pro-
gram module is written in the form of a complete subroutine; that is data is
assumed present on entry in some place, usually in a register, and is terminated
by a ReTurn from Subroutine (RTS) instruction. Subroutine structure is the
subject of Chapter 5.
Implementing a software function involves developing an appropriate algo-
rithm, writing code in a suitable language, testing and debugging. There is little
that can be done to mechanize the former, as algorithms are an expression of
human creativity. Once this has been done, a range of software tools, such as
assemblers, linkers, compilers and simulators, exist to aid in the production of
the latter phases. We will look at these in some detail in Part 2.
The most fundamental software tool is the assembler. An assembler is a pro-
gram that translates, on a line for line basis, symbolically-coded native language
to machine code for the target processor. This saves the error-prone tedium
of working out op-codes and relative offsets. Nearly as important is the use of
mnemonics for instructions and names for locations (labels). These, together
with the use of comments, provide superior documentation compared to strings
of hexadecimal digits (see page 168).
At this point in the text, we are only concerned to provide sufficient back-
ground to allow the reader to follow program syntax as presented in the re-
mainder of the text. Assemblers, like any other commercial package (such as
a word processor), have their own peculiar rules and peccadilloes, which have to
be learnt. One common denominator is the virtually unanimous use of the pro-
cessor manufacturers' standard instruction mnemonics, with minor variations.
Most of the variations lie in the layout of the source code and the directives (or
pseudo operators) used to pass information from the programmer to the assem-
bler.
A line of source code comprises four fields: an optional label, the essential