Z80 Divide by 10

Sean Conner spc at conman.org
Sun Jan 27 02:35:20 CST 2008


It was thus said that the Great dwight elvey once stated:
> 
> 
> 
> 
> > Date: Sat, 26 Jan 2008 17:21:56 -0500
> > From: spc at conman.org
> > To: cctalk at classiccmp.org
> > Subject: Re: Z80 Divide by 10
> >
> > It was thus said that the Great dwight elvey once stated:
> >>
> >> 230 clock cycles and no conditionals
> >> Dwight
> >
> > I came across this bit of code from http://www.hackersdelight.org/ to
> > divide by 10:
> >
> > unsigned int div10(unsigned int n)
> > {
> > unsigned int q;
> > unsigned int r;
> >
> > q = (n>> 1) + (n>> 2);
> > q = q + (q>> 4);
> > q = q + (q>> 8);
> > q = q + (q>> 16);
> > q = q>> 3;
> > r = n - ((q << 3) + (q << 1));
> > return (q + ((r + 6)>> 4);
> > }
> >
> > On the Z80, the (q>> 8) is trivial to handle, and you can probably skip
> > the (q>> 16) statement all-to-gether (since you're only handling 16 bits
> > anyway, this reduces to 0 so the statement there can be skipped). My Z80
> > skills are pretty weak (and I don't have any references at hand right now)
> > but this looks pretty straight forward to translate.
> >
> Hi
>  I coded this up in Forth and it does good at the divide.
> The values q and r are not much good, meaning one still
> has to calculate the remainder, similar to the last part of my
> code.
>  The operation>> is really tough on a Z80 for 16 bit operations.
> One needs to do thing through the a register and use rra instructions
> or use the rr r instruction with a clearing of the carry before each
> single bit shift. Not having a parameterized shift like the 386 has
> makes it difficult.
>  I doubt one could beat the 230 cycles I had since each shift cost
> 12 cycles. This is 34 to 38 shifts depending on optimization or 408
> cycles minimum without the 7 16 bit adds. This is already more
> than the 230 cycles.

  Where do you get 34 shifts from?  I can see that being reduced down to
about 12 shifts:

	/* q = (n >> 1) + (n >> 2) */
	n >>= 1;
	q = n;
	n >>= 1;
	q += n;

	/* q = q + (q >> 4) */
	q' = q;
	q' >>= 1;
	q' >>= 1;
	q' >>= 1;
	q' >>= 1;
	q += q';

	/* q = q + (q >> 8) */
	q'(lo) = q(hi);
	q(lo) += q'(lo);
	q(hi) += CarryFlag;

	/* q = q + (q >> 16) */
	/* skipped because we're using 16 bits */

	/* q = q >> 3 */
	/* skipped because not needed */

	/* r = n - ((q << 3) + (q << 1)) */
	q' = q;
	q' >>= 1;
	q' >>= 1;
	q  += q'
	r = n - q;

	/* return q + ((r + 6) >> 4 */
	r += 6;
	r >>= 1;
	r >>= 1;
	r >>= 1;
	r >>= 1;
	q += r;

  That's now 144 cycles for shifts, with 6 16 bit adds and two 8 bit adds. 
Now that I have access to my Z80 references (but no real Z80 to test this
on), I think the following code will work:

;-------------------------------------
; DIV10		Perform division by 10
; Entry:	BC - 16 bit value
; Exit:		HL - quotient
;		everything else trashed.
;		If you need a remainder,
;		then you can do
;		REM = HL - ((HL << 3) + (HL << 1))
;------------------------------------

savehl	defw	0
const6	defw	6
			;		bytes	cycles
div10	ld	d,b	; save n	; 1	4
	ld	e,c			; 1	4


; q = (n >> 1) + (n >> 2)

	srl	b			; 2	8
	rr	c			; 2	8
	ld	h,b			; 1	4
	ld	l,c			; 1	4
	slr	b			; 2	8
	rr	c			; 2	8
	add	hl,bc			; 1	11

; q = q + (q >> 4)

	ld	b,h			; 1	4
	ld	c,l			; 1	4
	ld	(savehl),hl		; 3	16
	ld	hl,savehl		; 3	10
	and	a			; 1	4
	rrd				; 2	18
	ld	hl,(savehl)		; 3	16
	add	hl,bc			; 1	11

; q = q + (q >> 8)

	ld	b,0			; 2	7
	ld	c,h			; 1	4
	add	hl,bc			; 1	11

; q = q + (q >> 16) skipped
; q = q >> 3 skipped
; r = n - ((q << 3) + (q << 1))

	ld	b,h			; 1	4 
	ld	c,l			; 1	4
	srl	b			; 2	8
	rr	c			; 2	8
	srl	b			; 2	8
	rr	c			; 2	8
	add	hl,bc	; HL = q	; 1	11
	ld	b,h	; BC also = q	; 1	4
	ld	c,l			; 1	4
	ex	de,hl	; HL = n, DE = q; 1	4
	or	a	; CF = 0	; 1	4
	sbc	hl,de	; n - q		; 2	15

; return q + ((r + 6) >> 4

	ld	de,(const6)		; 4	20
	add	hl,de			; 1	11
	ld	(savehl),hl		; 3	16
	ld	hl,savehl		; 3	10
	and	a			; 1	4
	rdd				; 1	18
	ld	hl,(savehl)		; 3	16
	add	hl,bc			; 1	11

	ret				; 1	10
					;67    362

  Well ... crap.  362 cycles.  The Z80 really *sucks* on rotates, doesn't
it?  I thought the RDD instruction would help, but really, it just breaks
even on the cycle count over four repetitions of SLR/RR (at least, the way
I've coded it because of the bottleneck in the use of HL).

  -spc (Fun exercise though ... )




More information about the cctech mailing list